Information retrieval and storing for the contents of scientific journals

Assoc. Prof. Leon Abdillah

Information retrieval and storing for the contents of scientific journals

Leon Andretti Abdillah

Introduction

Classification of scientific documents is becoming of more importance to research communities [1], due to the increasing volume of scientific literature published in both manuscript format and available electronically. The electronic representation of scientific documents may include journals, technical reports, program documentation, laboratory notebooks etc [2]. Figure 2 shows an example of a journal article, and the typical components that are included within it.

Information Retrieval

Information retrievals is the computerized process of producing a list of documents that are relevant to an inquirer’s request by comparing the user’s request to an automatically produced index of the textual content of documents in the system [4]. User request will use word(s) as a key for searching in search engine (google is the most popular one).

The field of information retrieval evolved to provide principled approaches to searching various forms of content from the internet using search engines.

Search Engine

Searching documents on the Internet has become one of the most commonly used activities for users, ranging from tourist information, social activities to the review of scientific documents.  My proposed research will focus on the scholarly or scientific documents. In searching activity, user will input the keyword(s) as query to the interface to ask the system to searching it for him/her.

Generally we recognize the system as Search Engine (SE). Right now there are various of SE in the internet.

Search engines have become the most important medium for Internet users to find pages on the web [5]. Many researches and surveys show that Google is the number one followed by yahoo. For scholarly searching, Google has launched Google Scholar (GS) in 2004 (beta version).

Conclusion

So this proposal will able to find the best model to bridge the both side (how users can through the bridge that connect the publishers and the crawler). How the crawler could find the best way to recognize the metadata that publisher provided their publication documents.

References

[1] M. Cao and X. Gao, “Combining Contents and Citations for Scientific Document Classification,” in AI 2005: Advances in Artificial Intelligence. vol. 3809, S. Zhang and R. Jarvis, Eds., ed: Springer Berlin / Heidelberg, 2005, pp. 143-152.

[2] R. J. Fateman, “More versatile scientific documents,” in Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on, 1997, pp. 1107-1110 vol.2.

[3] University of Maryland. (2010. How to Read a Scholarly article. Available: http://lib.guides.umd.edu/content.php?pid=146938&sid=1248705

[4] E. D. Liddy, “Automatic Document Retrieval,” Encyclopedia of Language and Linguistics, 2005.

[5] N. Hochstotter and M. Koch, “Standard parameters for searching behaviour in search engines and their empirical evaluation,” J. Inf. Sci., vol. 35, pp. 45-65, 2009.

[6] G. Nunberg, “Google Books: The Metadata Mess,” in Google Book Settlement Conference, UC Berkeley, 2009.

[7] P. Mayr and A.-K. Walter, “An exploratory study of google scholar,” Online Information Review, vol. 31, pp. 814-830, 2007.

[8] P. Jacso, “Metadata mega mess in Google Scholar,” Online Information Review, vol. 34, pp. 175-191, 2010.

[9] Waikato. (2006, Meta data and search engines. Available: http://webteam.waikato.ac.nz/guidelines/search_engines.shtml

[10] T. Bray. (2003, On Search: Metadata. Available: http://www.tbray.org/ongoing/When/200x/2003/07/29/SearchMeta

[11] M. Andric and W. Hall, “Exploiting Metadata Links to Support Information Retrieval in Document Management Systems,” in Enterprise Distributed Object Computing Conference Workshops, 2006. EDOCW ’06. 10th IEEE International, 2006, pp. 59-59.

[12] M. Kobayashi and K. Takeda, “Information retrieval on the web,” ACM Comput. Surv., vol. 32, pp. 144-173, 2000.

[13] Public Broadcasting Metadata Dictionary Project Namespace. Available: http://www.pbcore.org/PBCore/PBCoreNamespaceContext.html

[14] K. A. F. Mohammed, “The impact of metadata in web resources discovering,” Online Information Review, vol. 30, pp. 155-167, 2006.

[15] D. Hillmann. (2005, Using Dublin Core – The Elements. Available: http://dublincore.org/documents/usageguide/elements.shtml

[16] Ganesa. (2008, Dublin Core. Available: http://ganesha.fr/index.php?post/2008/03/31/Dublin-Core

Leave a Reply

Your email address will not be published. Required fields are marked *