A weekend project borned
I had a crazy idea and I wanted to see how popular it is. I also wanted to have some basic statistics like how many people are using it etc. So I thought, what is the easiest and most usual thing somebody is doing to promote his activity? He would create a facebook page about it. So I start searching a bit about FQL and see what info I could get from there and how easy it is.Facebook gives you a nice playground to try graph API, facebook explorer. I have played quite enough with it but I never had the chance to try FQL. So let's say that what I wanted to do was finding as much information as I could, about pages that are TV shows and have as genre "comedy". Even though I didn't read the documentation extensively, I searched and I found out that I couldn't take this result only with one query. I had to provide approximately the name of the page each time. FQL gives you the command CONTAINS which searches for the name. But I wanted all the pages, so what I should do? I thought about it and I decided to create a script which it will read the words from a dictionary, it will provide the necessary facebook query each time and store the result. Each word wil be inside the fql "contains" and it will search based on this.
The script
My first fast attempt to test drive was something like the following: The script is pretty basic. The dictionary had about 150.000 words, I haven't any delay so after about 600 requests my access token got banned (I manually put it, I just copy-paste it from the facebook explorer page). OK it seems it works! But now it needs improvements. I searched a bit about the limits of facebook requests but unfortunately I haven't found any official source, I found an estimate of about 600 requests per 600 seconds per IP, so I put a delay of 2 seconds to be sure. I've test drive it a bit more and I also get banned after many more requests so I had to try something different.
To oauth or not to oauth
Ok, for doing this kind of requests you have to get an access token. How do you take one automatically? One way is oauth, but that means my I need a callback url, which means I need a server etc. oh what a pain, isn't there any other faster and easier way? Yes there is, it's name: Selenium webdriver.
I wrote a script about 10 lines of code which opens a browser, login to facebook, visit the facebook page, takes the access token and kills the browser. But my token was still invalidated randomly some times or got expired. So what could I do? Simply enough, I used a second facebook account (of my girlfriend) and when one token expired or got some problem, it would use the second account and vice versa.
The result
Here is the completed code: One thing you would notice is that facebook column names, like the "category" column, it is "type" in FQL and in general is a bit of a mess. The names of the columns are different when you are query them from the json format that you get as a response from the graph API. Why? It should be an explanation I think.I know my code isn't at all elegant, it is more like a hackish way to do something like that. I don't know if there is a better way of doing this, but I like it and I had fun making it. Nevertheless I'd love to see improvements. My script took about 4 days to complete and I needed to restart it sometimes (internet connection problems etc.). I could do it better in many ways (for example remove the word that is already been searched), but it worked and it took me the minimum time! Now I am doing some queries and I am taking some very interesting results for the specific pages. I also know that I didn't get all the facebook pages regarding the field I was searching, because I've done some manual tests and the CONTAINS in the FQL doesn't search as %LIKE%. To be honest, it bugs me a bit how it really works. I may have to read a bit more the documentation to learn about it.