Penerapan Teknik Web Scraping pada Mesin Pencari Artikel Ilmiah

Assoc. Prof. Leon Abdillah

Penerapan Teknik Web Scraping pada Mesin Pencari Artikel Ilmiah

Ahmat Josi, Leon Andretti Abdillah, Suryayusra

Search engines are a combination of hardware and computer software supplied by a particular company through the website which has been determined. Search engines collect information from the web through bots or web crawlers that crawls the web periodically. The process of retrieval of information from existing websites is called “web scraping.” Web scraping is a technique of extracting information from websites. Web scraping is closely related to Web indexing, as for how to develop a web scraping technique that is by first studying the program makers HTML document from the website will be taken to the information in the HTML tag flanking the aim is for information collected after the program makers learn navigation techniques on the website information will be taken to a web application mimicked the scraping that we will create. It should also be noted that the implementation of this writing only scraping involves a free search engine such as: portal garuda, Indonesian scientific journal databases (ISJD), google scholar.

Citation:

A. Josi, L.A. Abdillah, Suryayusra, “Penerapan teknik web scraping pada mesin pencari artikel ilmiah,” Jurnal Sistem Informasi (SISFO), vol. 5, September 2014.

References

Abdillah, L. A. (2012). PDF articles metadata harvester. Jurnal Komputer dan Informatika (JKI), 10(1), 1-7.

Abdillah, L. A., & Emigawaty. (2009). Analisis laporan tugas akhir mahasiswa Diploma I dari sudut pandang kaidah karya ilmiah dan penggunaan teknologi informasi. Jurnal Ilmiah MATRIK, 11(1), 19-36.

Abdillah, L. A., Falkner, K., & Hemer, D. (2010). Information retrieval and storing for the contents of scientific journals (HDR poster day /poster presentation). Adelaide, South Australia: The University of Adelaide.

Abdillah, L. A., Falkner, K., & Hemer, D. (2011). Scientific document retrieval based on evidence, citation position, and author information of metadata usage (HDR poster day /poster presentation). Adelaide, South Australia: The University of Adelaide.

Bakaev, M., & Avdeenko, T. (2014). Data Extraction for Decision-Support Systems: Application in Labour Market Monitoring and Analysis. International Journal of e-Education, e-Business, e-Management and e-Learning (IJEEEE), 4(1).

Darmadi, B. A., Intan, R., & Lim, R. (2006). Aplikasi Search Engine Paper Karya Ilmiah Berbasis Web dengan Metode Fuzzy Relation. Jurnal Informatika, 6(2), pp. 95-99.

Hirschey, J. (2014). Symbiotic Relationships: Pragmatic Acceptance of Data Scraping. Berkeley Technology Law Journal, 29.

Jennings, F., & Yates, J. (2009). Scrapping over data: are the data scrapers’ days numbered? Journal of Intellectual Property Law & Practice, 4(2), 120-129.

Juliasari, N., & Sitompul, J. C. (2012). Aplikasi Search Engine dengan Metode Depth First Search (DFS). BIT Numerical Mathematics, 9.

Lindenberg, F. Getting Data from the Web, from http://datajournalismhandbook.org/1.0/en/getting_data_3.html (last visited June 23, 2014)

Pressman, R. S. (2001). Software engineering: a practitioner’s approach (5th ed.). New York, USA: McGraw-Hill.

Sobri, M., & Abdillah, L. A. (2013). Aplikasi belajar membaca iqro’ berbasis mobile. Paper presented at the Seminar Nasional Teknologi Informasi & Multimedia (Semnasteknomedia), STMIK AMIKOM Yogyakarta.

The Computer Advisor. Web site scraper the most effective tool for web data extraction, [Online] (Updated 09 Juni 2014) Available at: http://www.thecomputeradvisor.net/web-site-scraper-the-most-effective-tool-for-web-data-extraction/ [Diakses 09 Juni 2014]

Turland, M. (2010). php| architect’s Guide to Web Scraping with PHP. Introduction-Web Scraping Defined, str, 2.

Utomo, M. S. (2012). Implementasi PHP sebagai Penghasil Konten Otomatis pada Halaman Situs. Dinamik-Jurnal Teknologi Informasi, 17(2).

Utomo, M. S. (2013). Web Scraping pada Situs Wikipedia menggunakan Metode Ekspresi Regular. Dinamik-Jurnal Teknologi Informasi, 18(2). 

Leave a Reply

Your email address will not be published. Required fields are marked *