<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Manakor Coding Map™ &#187; PHP</title>
	<atom:link href="http://www.manakor.org/category/php/feed" rel="self" type="application/rss+xml" />
	<link>http://www.manakor.org</link>
	<description>Tips and tricks on webpage&#039;s slicing, programming and design</description>
	<lastBuildDate>Mon, 12 Apr 2010 07:18:33 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The best way to hide email address from spyders and bots</title>
		<link>http://www.manakor.org/the-best-way-to-hide-email-address-from-spyders</link>
		<comments>http://www.manakor.org/the-best-way-to-hide-email-address-from-spyders#comments</comments>
		<pubDate>Thu, 28 Jan 2010 12:27:39 +0000</pubDate>
		<dc:creator>Nikita Sumeiko</dc:creator>
				<category><![CDATA[JQuery]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Tools & Tricks]]></category>

		<guid isPermaLink="false">http://www.manakor.org/?p=101</guid>
		<description><![CDATA[In today’s world almost every website is based on powerful CMS system, where it’s administrator is able to edit website’s main content on the fly. Often this ability drives into an unlikely email address publications, which brings a lot of spam to a website’s owners.
Therefore, to fight against spam I do offer to hide all [...]]]></description>
			<content:encoded><![CDATA[<p>In today’s world almost every website is based on powerful CMS system, where it’s administrator is able to edit website’s main content on the fly. <strong>Often this ability drives into an unlikely email address publications, which brings a lot of spam to a website’s owners.</strong><br />
Therefore, to fight against spam I do offer to hide all the email addresses on every website’s page in handsome and clear way. From one side, this would help us to avoid spiders, which catches all the open to public email addresses. But from the other side, this technique would bring our email address to real visitors.</p>
<h2>Playing on the server side</h2>
<p>As is well known, the greater part of email spiders (robots) surfs every website’s code, looking for email addresses and saves them. That is why I offer to find and replace every email address, which has been loaded as a single text or mailto hyperlink before spider will see it. The fastest way is by using PHP server side scripting.</p>
<p>First of all, we are going to write a function to use all over our website. The main purpose of this function is to find all the published email addresses and replace them into specific, but very useful format (example AT example DOT com). In this function we&#8217;ll use PHP Regular expressions to find all the necessary formats and some basic PHP function to rewrite them properly. By this way we hide all our emails on the server side and web spiders, bots, which catches emails, will get nothing at all.</p>
<h5>Start to code a function</h5>
<p>So, let&#8217;s start with the function coding:</p>
<pre class="brush: php;">
// function which replaces mailto links into specific format
function tep_rewrite_email($content) {

  // function rules will go here

}
</pre>
<h5>Include the right format email pattern</h5>
<p>Inside this function we are going to include some variables, which will help us to do the job. And the first one is regex email pattern, which is written in a specific format to use further. Drop an eye on <a title="You Don’t Know Anything About Regular Expressions: A Complete Guide" href="http://net.tutsplus.com/tutorials/javascript-ajax/you-dont-know-anything-about-regular-expressions/">NetTuts+ Regular Expressions Complete Guide</a> and than move forward to the code below:</p>
<pre class="brush: php;">
// regex email address pattern, format (\\1)@(\\2).(\\3)
$email_patt = '([A-Za-z0-9._%-]+)\@([A-Za-z0-9._%-]+)\.([A-Za-z0-9._%-]+)';
</pre>
<h5>Add standard mailto link pattern</h5>
<p>Than we add standart html mailto link pattern, which will match all the mailto links. Moreover, this pattern is flexible, and that is why it will not skip any real link which contains &#8216;mailto:&#8217; expression:</p>
<pre class="brush: php;">
  // pattern for html links: &lt;a href=&quot;mailto: example@exmaple.com&quot;&gt;Some other text&lt;/a&gt;
  // attributes before and after 'href' do not interfere
  $mailto_pattern = '#\&lt;a[^&gt;]*?href=\&quot;mailto:\s?' . $email_patt . '[^&gt;]*?\&gt;[^&gt;]*?&lt;\/a\&gt;#';
</pre>
<h5>Set up a result you need to get</h5>
<p>And the last variable to add is the result, which we finally would like to get. It&#8217;s specific (example AT example DOT com), but very usefull to avoid any spider. Have a look:</p>
<pre class="brush: php;">
  // rewrtite result
  $rewrite_result = '&lt;span class=&quot;mailme&quot;&gt;\\1 AT \\2 DOT \\3&lt;/span&gt;';
</pre>
<h5>Find and replace is simple</h5>
<p>When variables are completely added, we are going to use simple PHP functions to find and replace necessary content. I&#8217;d like to remeber, that emails can be published in two ways &#8211; as a mailto link and as a simple emails address. Therefore we check webpage&#8217;s content two times. Let&#8217;s see the code below:</p>
<pre class="brush: php;">
  // firstly, look for html mailto links and replace them
  $content = preg_replace($mailto_pattern, $rewrite_result, $content);

  // secondly, find stacionary emails without links and replace them too
  $content = preg_replace('#' . $email_patt . '#', $rewrite_result, $content);
</pre>
<p>As you see, <a title="PHP preg_replace function" href="http://php.net/manual/en/function.preg-replace.php">PHP preg_replace</a> function is looking for our mailto link and email patterns and replace them completely by outputting a necessary result.</p>
<p>I will repeat again, that we need such a specific format (<span class="mailme">example AT example DOT com</span>) for 2 aims:</p>
<ol>
<li>To avoid spiders looking for a fresh emails to catch inside your page&#8217;s source code</li>
<li>To revert it back into clear, valid and visible format for real visitors, not bots</li>
</ol>
<h5>PHP function which is ready to work for you</h5>
<p><strong>Now, when every component of a function is ready, let&#8217;s group them all:</strong></p>
<pre class="brush: php;">
  function tep_rewrite_email($content) {
    $email_patt = '([A-Za-z0-9._%-]+)\@([A-Za-z0-9._%-]+)\.([A-Za-z0-9._%-]+)';
    $mailto_pattern = '#\&lt;a[^&gt;]*?href=\&quot;mailto:\s?' . $email_patt . '[^&gt;]*?\&gt;[^&gt;]*?&lt;\/a\&gt;#';
    $rewrite_result = '&lt;span class=&quot;mailme&quot;&gt;\\1 AT \\2 DOT \\3&lt;/span&gt;';

    $content = preg_replace($mailto_pattern, $rewrite_result, $content);
    $content = preg_replace('#' . $email_patt . '#', $rewrite_result, $content);

    // remember to add return here
    return $content;
  }
</pre>
<p>Finally, we get a wonderful and necessary result in our source code. The image below show how does the code looks like:</p>
<p><img class="aligncenter size-full wp-image-135" title="PHP function which finds email patterns" src="http://www.manakor.org/wp-content/uploads/2010/01/php-fuction-result.jpg" alt="" width="590" height="100" /></p>
<h2>JQuery or Mootools, it&#8217;s your choice</h2>
<p>When all the single text or mailto hyperlink email addresses have been rewritten to a new format, we are going to turn them back into valid links by using JQuery, which provides client side scripting and works with all the major browsers. I think the best way here would be to use <a title="Hide Email with JavaScript / jQuery" href="http://www.html-advisor.com/javascript/hide-email-with-javascript-jquery/">HTML-Advisor method</a>. However, there are different ways to do it, for example using <a title="Oskar's jQuery to Mootools conversion" href="http://jsfiddle.net/oskar/MJujB">Oskar&#8217;s Mootools conversion</a>.</p>
<p>I&#8217;d like to note, that JavaScript is a clients side language, therefore it provides us an ability to make code modifications directly in a visitor browser. And our aim here is to workout a simple function, which will find prepared email format and replace it back into a visible and understandable links to the audience.</p>
<h5>JQuery function that works</h5>
<p>Completely our JQuery code should look like this:</p>
<pre class="brush: jscript;">
if ( $(&quot;span.mailme&quot;).length ) {
  // variables, which will be replaced
  var at = / AT /;
  var dot = / DOT /g;

  // function, which replaces pre-made class
  $('span.mailme').each(function () {
    var addr = $(this).text().replace(at, '@').replace(dot, '.');
    $(this).after('&lt;a href=&quot;mailto:' + addr + '&quot;&gt;' + addr + '&lt;/a&gt;');
    $(this).remove();
  });
}
</pre>
<h5>Mootools conversion</h5>
<pre class="brush: jscript;">
var mailme = $$('.mailme'), at = / AT /, dot = / DOT /g;

mailme.each(function(el){
 var addr = el.get('text').replace(at, '@').replace(dot, '.');

 new Element('a', {
 href: 'mailto:'+ addr,
 html: addr
 }).inject(el, 'after');

 el.destroy();
});
</pre>
<p>As you see we have made some modification to the clients side. As a result every website visitor will see published emails as a normal links, but email spiders won&#8217;t have a dinner this night.</p>
<h2>Modifications to the output</h2>
<p>The last we should make, is to modificate our webpages output. I mean to find out where our CMS outputs each page&#8217;s body text (which may consist email addresses) and replace it with our premade PHP function:</p>
<pre class="brush: php;">

  // find a string which outputs every page body text from your database
  $string = 'Our company is based in London and we bring strong metallic structures to the world. Our experience are wide and stable. To get more information about products we offer, contact sales department by email: &lt;a href=&quot;mailto: sales@company.com&quot; title=&quot;Sales&quot;&gt;sales@company.com&lt;/a&gt;. And to offer sponsorship email directly to Lisa: lisa@company.com';
  $string = tep_rewrite_email($string);
</pre>
<h2>What we get is what we have made</h2>
<p>All in all, we get a quite simple result, which will be never cathed by email spiders. And our real website’s audience would see all the published email addresses as normal. By implementing this technique into each new page you are going to build, you will stay sure, that email addresses are hiden and spam is in the past.</p>
<p>I see this technique very useful for every website, which is updating by persons who don’t know how to keep an email address safe. It goes very well with different CMS systems and can be implemented in a simple way.</p>
<blockquote><p>The &#8220;free&#8221; distribution of unwelcome or misleading messages to thousands of people is an annoying and sometimes destructive use of the Internet&#8217;s unprecedented efficiency.<br />
<span style="color: #333333;">Bill Gates, New York Times, 1998</span></p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.manakor.org/the-best-way-to-hide-email-address-from-spyders/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
