Comments on: Software Needed: Enhanced web search

By: DrMcCoy

DrMcCoy — Tue, 03 Nov 2009 07:22:34 +0000

Hmm, or alternatively, I bet a Greasemonkey script doing that could be hacked together…

By: Googleverse

Googleverse — Tue, 03 Nov 2009 07:16:16 +0000

Interesting Idea.

By: Gray Gaffer

Gray Gaffer — Mon, 02 Nov 2009 22:12:42 +0000

Google API has the ability to return XML formatting for the results, but their terms mean that you have to use their API, get a special API key, and have the web site using it be publicly accessible without restrictions. Without their library modules and valid key all you get back is encrypted binary.

So back to page scraping.

By: Dan J

Dan J — Mon, 02 Nov 2009 17:40:57 +0000

I've done some Perl code that performs web page text analysis for SEO, and some of that seems connected to some of what you're interested in doing. Stripping out all but the actual content of the pages is essential. I can see how googlereg would be helpful for some genealogical searching that I do once in a while.

By: Greg Laden

Greg Laden — Mon, 02 Nov 2009 14:51:47 +0000

The pipes look interesting . I’ve seen that before but forgot about them.

As far as dealing with HTML, that’s fairly easy with the proper text based web readers and sed, but there should be something in the google api that will work.

The problem with the google api might be that they change it now and then.

By: Gray Gaffer

Gray Gaffer — Mon, 02 Nov 2009 14:39:46 +0000

Yes, I know it is not a command line tool. That would take some more research, like – are Google search results available in pure XML feed formats instead of wrapped in human-only visual HTML crippled syntax? If that is true, then there are CPAN XML modules that can be used along with the regex post-results filter. But if your desired end result is a web page then Yahoo Pipes may do the trick for you.

By: Gray Gaffer

Gray Gaffer — Mon, 02 Nov 2009 14:36:19 +0000

Interesting. But not a Perl one-liner because the results are not quite simple enough.

HOWEVER

I found this interesting tool you might want to check out:

Yahoo Pipes

http://pipes.yahoo.com/pipes/

includes amongst many other results filters a regex tool. With examples filtering Google results.