Simplest PHP Site Search-engine Using Unix Grep
by Chief ProgrammabilitiesProgrammabilities.com
Tuesday, 18th October 2005
Grep is a common Unix command. It is used to search. Unix's grep searches one or more input files for lines containing a match to a specified pattern. By default, grep prints the matching lines.
PHP can call external programs. —It can call the Unix commands that are on your Linux server. In Unix, we can easily use the command grep to make a simple search-engine. We will add some complexity to this, by having the form to accept the search string and the code to display the results, all in the same file.
See the demo at http://programmabilities.com/php/grep.php.
Here is the PHP script using grep that includes the PHP code and the HTML search-engine form all in one page (save it in a file with a .php extension):
<body>
<h1>Grep Search-engine with PHP</h1>
<p>
Search <a href="http://programmabilities.com/"
title="programmabilities.com">Programmabilities.com</a>:
</p>
<p>
<form action="<?php echo "$PHP_SELF"; ?>" method="post">
<input type="text" name="searchstr"
value="<?php echo "$searchstr"; ?>" size="20"
maxlength="30"/>
<input type="submit" value="Search!"/>
</form>
</p>
<?php
if (! empty($searchstr)) {
// empty() is used to check if we've any search string.
// If we do, call grep and display the results.
echo '<hr/><br/>';
// Call grep with case-insensitive search mode on all files
$cmdstr = "grep -i $searchstr *";
$fp = popen($cmdstr, 'r'); // open the output of command as a pipe
$myresult = array(); // to hold my search results
while ($buffer = fgetss($fp, 4096)) {
// grep returns in the format
// filename: line
// So, we use split() to split the data
list($fname, $fline) = split(':', $buffer, 2);
// we take only the first hit per file
if (! defined($myresult[$fname])) {
$myresult[$fname] = $fline;
}
}
// we have results in a hash. lets walk through it and print it
if (count($myresult)) {
echo '<ol><br/>';
while (list($fname, $fline) = each ($myresult)) {
echo "<li><a href=\"$fname\">$fname</a> : $fline </li>\n";
}
echo '</ol><br/>';
} else {
// no hits
echo "Sorry. Search on <strong>$searchstr</strong>
returned no results.<br/>\n";
}
pclose($fp);
}
?>
</body>
</html>
...And that's it! By using Unix's built in grep search command on your Linux server, you don't have to write reams of PHP code yourself from scratch to conduct the search part of your PHP search-engine program.
Please note that this is not an optimal way to implement a search-engine. It will help to learn about PHP. Ideally, one should build a database of keywords and then use the search against that. This example is not an optimal way to implement a search-engine because of the overhead and the server load it generates by grepping each document every time a user initiates a search. That is exactly why more clever search-engines with flat structure index all pages and just search a file generated from all. Arguably this means you would have to update that file every time the site gets updated, but in the long run it would be a lot less straining for the server.
Notes:
- PHP_SELF is a variable maintained by PHP. It contains the name of the current file.
- fgets() function reads a line, at the most 4096(specified) characters long.
- fgetss() is just like fgets(), but it will parse the output to have proper HTML.
- split() is called with 2, because we need only a split by two. Further ':' are ignored.
- each() is an array function which helps to easily walk through an array.
- popen() / pclose() are identical to fopen() / fclose(), but operate on pipes.
Options:
Printer Friendly
Chief Programmabilities founded the resource site Programmabilities.com. — We picked out the best scripts you should choose to build your site with.
