[kwlug-disc] Scraping Facebook

Tue Dec 5 16:21:59 EST 2017

"Scraping" usually refers to getting data out of the HTML UI, not
necessarily an API. It would be great if Facebook provided a nice API, or
an RSS feed, but it doesn't and it won't.

Scraping HTML is a pain in the as and incredibly fragile, but you can do
it. It does get harder when most pages these days are rendered by
Javascript, and not simply handed to you buy an HTTP request. But if you've
got the time and persistence, you could emulate a browser, run all the
Javascript and then scrape the resulting HTML to get your data.

And then you begin the cat and mouse game of fixing your script every time
Facebook changes their pages/scripts to prevent you from doing this. But
again, if you have the time and persistence, you can do this.

Then you need to have the cash and lawyers for the inevitable lawsuits.
Cory Doctorow works with the EFF and they are made of lawyers.  So I think
by suggesting this, he's offering an EFF partnership to help you out when
the lawsuits start.  This is the actual end goal, I believe: to get a court
settlement that explicitly makes this sort of activity legal. It's a bit of
a long shot, and the courts haven't really been going in this direction,
but I agree it would be nice if they did.

Darcy.

On Tue, Dec 5, 2017 at 2:24 PM, Paul Nijjar via kwlug-disc <
kwlug-disc at kwlug.org> wrote:

> At the Doctorow talk yesterday, somebody asked an astute question
> about network effects. She was not on Facebook, but all of her friends
> were, and she was finding it harder to resist its creepy tendrils.
> But she wanted a non-creepy Facebook experience (which Doctorow
> suggested was possible). What would the transition from Facebook to
> something non-creepy look like, given that all her friends were on
> Facebook now?
>
> Doctorow suggested that the woman write a program that would scrape
> Facebook's content and present it to end users. Then those end users
> could apply their own filters and algorithms to the content in
> non-creepy ways. This software would be a bridge in the sense that
> posts to this abstraction layer would be fed back to Facebook.
>
> I am not seeing how this could work. Facebook controls the data on its
> servers, and it permits access to its API via keys. It decides who
> gets to access its data and for what purposes. If some subversive
> startup tried to scrape its data, wouldn't it just shut down that
> startup by locking it out of its API? If end users were the ones
> logging into Facebook and running this app (so the end users were
> authenticating, not the app) then wouldn't Facebook be able to lock
> the end users out? That would be an effective way of killing the
> startup right quick.
>
> I am guessing that such a service would be against the Facebook Terms
> of Service in any case.
>
> Facebook does have a way to back up your data. That could be used as
> an input to this program. But as far as I know this process is not
> automated (surprise, surprise), and it is not that helpful.
>
> In short, I am not seeing how this idea could work. But maybe I am
> being dumb again. Maybe such clients already exist? How well does
> Facebook support cross-posting to other social networks?
>
> - Paul
>
> --
> http://pnijjar.freeshell.org
>
> _______________________________________________
> kwlug-disc mailing list
> kwlug-disc at kwlug.org
> http://kwlug.org/mailman/listinfo/kwlug-disc_kwlug.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://kwlug.org/pipermail/kwlug-disc_kwlug.org/attachments/20171205/7f806790/attachment.htm>