soft-analytics-01/docs/sections/scraping.tex

\section*{Issue Scraping}
To scrape the data from GitHub, we used the API that GitHub exposes to its users.
By using our GitHub token, we managed to make the appropriate requests to return the issues.
The raw issues where saved as single json files (one per issue), and zipped into a \verb|.tar.gz| archive.
Some downloaded issues, however, were blank JSON files.
We suspect that these issues were available at the time of listing, but they have been since deleted and are not available anymore through the GitHub API, therefore we choose to ignore them.
The internal issue IDs for these issues were: \verb|111293876|, \verb|116791101|, \verb|116805010|, \verb|116805553|, \verb|116805977|, \verb|116901067|, \verb|117010737|, \verb|117065474|, \verb|117067419|, \verb|117068152|, \verb|117069931|, \verb|116803071|, \verb|116923175|, \verb|1169895| \verb|17|, \verb|117063475|, and \verb|117067644|