Google Sitemaps

by Tony Lawrence
Saturday, 24th September 2005

Google is now letting web sites submit an xml file that lists urls and some information about how often the pages change, and how important the page is relative to other pages. Basically, it gets you to do part of the work for them - which we would hope helps everyone.

I do wish Google would add to this to include at least a "not about" property. I realize that Google isn't going to let anyone tell them what a page IS about, but a "not about" property can't really be abused as easily and could help their accuracy in search results.

Google provides a Python script that can produce the file for your site; I wrote a Perl script that does the same:

#!/usr/bin/perl
chdir("/yourhtdocs");
@stuff=`find . -type f -name "*.html"`;
open(O,">sitemap");
print O <<EOF;
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
EOF
foreach (@stuff) {
chomp;
s/^..//;
$rfile="/yourhtdocs/$_";
($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,$atime,$mtime,$ctime,$blksize,$blocks)=stat $rfile;
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime($mtime);
$year +=1900;
$mon++;
$mod=sprintf("%0.4d-%0.2d-%0.2dT%0.2d:%0.2d:%0.2d+00:00",$year,$mon,$mday,$hour,$min,$sec);
$freq="monthly";
$freq="daily" if /index.html/;
$priority="0.5";
$priority="1.0" if /index.html/;

print O <<EOF;
<url>
<loc>http://yoursite/$_</loc>
<lastmod>$mod</lastmod>
<changefreq>$freq</changefreq>
<priority>$priority</priority>
</url>
EOF
}
print O <<EOF;
</urlset>
EOF
close O;
unlink("sitemap.gz");
system("gzip sitemap");

Season to taste.. see https://www.google.com/webmasters/sitemaps/


© Tony Lawrence
View the original article Google Sitemaps