logo logo

Creating Google Directory in your webpage

directory Creating Google Directory in your webpage

The Google Directory (http://directory.google.com) overlays the Open Directory Project (ODP or DMOZ, http://www.dmoz.org) ontology onto the Google core index. The result is a Yahoo!-like directory hierarchy of search results and their associated categories with the added magic of Google’s popularity algorithms.

The ODP opens its entire database of listings to anybody—provided you’re willing to download a 283 MB file (and that’s compressed!). While you’re probably not interested in all the individual listings, you might want particular ODP categories, or you may be interested in watching new listings flowing into certain categories.

Unfortunately, the ODP does not offer a way to search by keyword sites added within a recent time period. So instead of searching for recently added sites, the best way to get new site information from the ODP is to monitor categories.

Because the Google Directory builds its directory based on the ODP information, you can use the ODP category hierarchy information to generate Google Directory URLs. This hack searches the ODP category hierarchy information for keywords that you specify, and then builds Google Directory URLs and checks to make sure that they’re active.

You’ll need to download the category hierarchy information from the ODP to get this hack to work. The compressed file containing this information is available from http://dmoz.org/rdf.html, and the specific file is here: http://dmoz.org/rdf/structure.rdf.u8.gz. Before using it, you must uncompress it with a decompression application specific to your operating system. In the Unix environment, the command looks something like this:

% gunzip structure.rdf.u8.gz

2.5.1. The Code

Save the following code to a text file called google_dir.pl:

#!/usr/bin/perl

# [#%0#2]google[/#%0#2]_dir.pl

# Uses ODP category information to build URLs into the Google Directory.

# Usage: perl [#%0#3]google[/#%0#3]_dir.pl "keywords" < structure.rdf.u8

use strict;

use LWP::Simple;

# Turn off output buffering.

$|++;

my $directory_url = "http://directory.[#%0#4]google[/#%0#4].com";

@ARGV == 1

  or die qq{usage: perl [#%0#5]google[/#%0#5]_dir.pl "{query}" < structure.rdf.u8\n};

# Grab those command-line specified keywords and build a regular expression.

my $keywords = shift @ARGV;

$keywords =~ s!\s+!\|!g;

# A place to store topics.

my %topics;

# Loop through the DMOZ category file, printing matching results.

while (<>) {

  /"(Top\/.*$keywords.*)"/i and !$topics{$1}++ 

    and print "$directory_url/$1\n";

}

2.5.2. Running the Hack

Run the script from the command line ["How to Run the Hacks" in the Preface], along with a query and the piped-in contents of the DMOZ category file:

% perl googledir.pl “keywords” < structure.rdf.u8

Replace keywords with the particular keywords that you’re after.

If you’re using the shorter category excerpt structure.example.txt, use this:

% perl googledir.pl “keywords” < structure.example.txt

2.5.3. The Results

Feeding the keyword mosaic into this hack would look something like this:

% perl googledir.pl “mosaic” < structure.rdf.u8

http://directory.[#%0#6]google[/#%0#6].com/Top/Arts/Crafts/Mosaics

http://directory.[#%0#7]google[/#%0#7].com/Top/Arts/Crafts/Mosaics/Glass

http://directory.[#%0#8]google[/#%0#8].com/Top/Arts/Crafts/Mosaics/Ceramic_and_Broken_China

http://directory.[#%0#9]google[/#%0#9].com/Top/Arts/Crafts/Mosaics/Associations_and_Directories

http://directory.[#%0#10]google[/#%0#10].com/Top/Arts/Crafts/Mosaics/Stone

http://directory.[#%0#11]google[/#%0#11].com/Top/Shopping/Crafts/Mosaics

http://directory.[#%0#12]google[/#%0#12].com/Top/Shopping/Crafts/Supplies/Mosaics

bottom

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

bottom