Web Scraping

20 August 2019 Link


This page contains a lot of learnings and notes I use for web scraping

Browser Requests

A browser request can be captured using the developer tools. In chrome More Tools->Developer Tools.

Issues

  • It is hard to capture requests which open a new tab. To overcome that in a developer tools window press F1 and select "Auto-open DevTools for popups". This will open and capture requests done by the popup.
  • If the popup closes automatically then the DevTools also closes. This then makes it hard to capture requests from auto closing popups. To get around that we need to use a packet capture software like WireShark.

LuaSocket

  • LuaSocket can be used to scrape web pages which do not require SSL.
  • The HTTP module is used for nearly all web scraping requests.
  • To use the http.request with table input and if you provide a source make sure to provide the Content-Length header which contains the length of the total string of bytes that you will send in that request.