1) David Hawking , Web Search Engines: Part 1 and Part 2 IEEE Computer, June 2006.
http://www.computing.dcu.ie/~gjones/Teaching/CA437/01642621.pdf
http://david-hawking.net/pubs/hawking_howthingswork2.pdf
2) Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.
http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104
3) MICHAEL K. BERGMAN, “The Deep Web: Surfacing Hidden Value” http://www.press.umich.edu/jep/07-01/bergman.html
http://www.computing.dcu.ie/~gjones/Teaching/CA437/01642621.pdf
http://david-hawking.net/pubs/hawking_howthingswork2.pdf
2) Shreeves, S. L., Habing, T. O., Hagedorn, K., & Young, J. A. (2005). Current developments and future trends for the OAI protocol for metadata harvesting. Library Trends, 53(4), 576-589.
http://quod.lib.umich.edu/cgi/t/text/text-idx?c=jep;view=text;rgn=main;idno=3336451.0007.104
3) MICHAEL K. BERGMAN, “The Deep Web: Surfacing Hidden Value” http://www.press.umich.edu/jep/07-01/bergman.html
1, Web Search Engines: Part1
This essay talks several aspects and changes in the web search engines such as the crawling problem and spam problem. I found this is really interesting. Web engines have to perfectly deal with these problems to meet patron's needs.
2, Web Search Engines: Part 2
Part 2 reviews the algorithms and data structures required to index 400 terabytes of web pages and deliver high-quality results in response to hundreds of millions of queries each day.
This article is a little harder for me, as I am not familiar with the terminology of information retrieval.
3, Current developments and future trends for the OAI protocol for metadata harvesting
The mission of the Open Archives Initiative, the entity responsible for the protocol, is to " develop and promote interoperability standards that aim to facilitate the efficient dissemination of the content". The protocol is based on common standards like XML. HTTP and Dubline Core.
4, The Deep Web: surfacing hidden value
This article contains several characteristics of deep web and compare the surface web against the deep web.
Directed query technology is the only mean to integrate deep and surface Web information. I am interested in exploring the content of deep web pages, what is the current achievements? How to realized the goal? This is still a problem.
