here - Departament d`Enginyeria de la Informació i de les

here - Departament d`Enginyeria de la Informació i de les
Extracting data from Online Social Networks
Cristina Pérez Solà and Jordi Herrera Joancomartı́
Departament d’Enginyeria de la Informació i les Comunicacions
Universitat Autònoma de Barcelona
July 1st, 2014
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Today’s goal
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
2
Online SocNs
3
Computer programming basics
4
Programming in R
5
Interacting with the provider
6
Data extraction project
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
Presentation
What are we going to do today?
2
Online SocNs
3
Computer programming basics
4
Programming in R
5
Interacting with the provider
6
Data extraction project
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Presentation
Who are we?
We are from the
Department of Information and Communications Engineering
at the
Autonomous University of Barcelona.
[email protected]
http://deic.uab.cat/~cperez/eusnworkshop/
http://tinyurl.com/EUSNdataextraction
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Presentation
What do we do?
Among other topics, we do research about Privacy in Online Social
Networks (OSN):
OSN Crawling.
Community detection algorithms.
Classify OSN users from the social
graph.
Infer private data from public data.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Presentation
What do we do?
Among other topics, we do research about Privacy in Online Social
Networks (OSN):
OSN Crawling.
Community detection
algorithms.
Classify OSN users from the social
graph.
Infer private data from public data.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Presentation
What do we do?
Among other topics, we do research about Privacy in Online Social
Networks (OSN):
OSN Crawling.
Community detection algorithms.
Classify OSN users from the
social graph.
Infer private data from public data.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Presentation
What do we do?
Among other topics, we do research about Privacy in Online Social
Networks (OSN):
OSN Crawling.
Community detection algorithms.
Classify OSN users from the social
graph.
Infer private data from public
data.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
What are we going to do today?
Schedule
Topic
Intro
About Online Social Networks (theory)
Computer programming (activity)
R (theory + 1 activity)
Break
Interaction with the OSN provider (theory + 1 activity)
Data extraction project (activity)
Time
15’
15’
90’
30’
45 ’
45’
120’
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
What are we going to do today?
Warning
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
2
Online SocNs
Introduction
Twitter
3
Computer programming basics
4
Programming in R
5
Interacting with the provider
6
Data extraction project
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Introduction
Introduction
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Introduction
Definition
We define social network sites as web-based services that allow
individuals to
(1) construct a public or semi-public profile within a bounded
system,
(2) articulate a list of other users with whom they share a
connection, and
(3) view and traverse their list of connections and those made by
others within the system.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Introduction
OSN Modeling
We usually model OSN with graphs.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter
Introduction
Twitter is an online social network and a microblogging service
created on 2006.
Users exchange small text messages called tweets.
Tweets are limited to 140 characters.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter
Web interface
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter
Twitter Users
Information about users on Twitter:
Unique identifier
Username (to login)
Screen name (showed to others, modifiable)
Verified?
Location
Geolocation active?
Language
...
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter
Tweets
There are 3 different kinds of Tweets:
Status updates.
Retweets.
Replies.
Moreover, tweets can contain geolocation information:
Location place identifiers.
Geographical coordinates.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter
Details
Users can follow other users.
Hashtags are used to label or classify tweets (#EUSNConference).
Trending topics are the most used words (or sentences) in
Twitter in a given moment.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter
Source of data for studies
Twitter data has been used in many studies and for different
purposes:
Twitter Alerts.
Real-time event detection by social sensors.
Twitter mood predicts the stock market.
Predicting Elections with Twitter.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
2
Online SocNs
3
Computer programming basics
Learning to code
An hour of code
4
Programming in R
5
Interacting with the provider
6
Data extraction project
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Learning to code
Coding?
But what exactly is coding? Coding is what makes it possible for
us to create computer software. Coding is writing software!
Coding can be done in many languages (R, Python, C, Java,
Javascript,...).
Vocabulary.
Syntax.
Writing a computer program
Think / design an algorithm.
Code the solution (implementation / coding).
Test / debug.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Learning to code
Motivation
Why is it interesting to learn to code?
We can automatize processes!
Computers are everywhere!
It helps us understand how systems around us work.
We learn how to think about problems.
We learn how to break down problems into small pieces.
We learn to develop systemmatic solutions.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Let’s do it!
An Hour of Code is an initiative to help introduce more than 10
million students of all ages to computer programming.
We are going to follow an introductory course to programming:
http://learn.code.org/hoc/reset
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Video: introduction
Video 1: Introduction
http://learn.code.org/hoc/reset
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Activities
Let’s try to solve puzzles 1-5!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Puzzle 5
turnRight();
moveForward();
turnLeft();
moveForward();
moveForward();
moveForward();
turnLeft();
moveForward();
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Video: for loops
Video 2: Mark Zuckerburg teaches repeat loops
http://learn.code.org/hoc/6
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Activities
Let’s try to solve puzzles 6-9!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Puzzle 9
for (var count = 0; count < 3; count++) {
moveForward();
moveForward();
turnRight();
}
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Video: while loops
Video 3: Chris Bosh teaches repeat until statements
http://learn.code.org/hoc/10
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Activities
Let’s try to solve puzzles 10-13!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Puzzle 13
while (notFinished()) {
turnRight();
moveForward();
turnLeft();
moveForward();
}
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Video: if statements
Video 4: Bill gates explains if statements
http://learn.code.org/hoc/14
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Activities
Let’s try to solve puzzles 14-17!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Puzzle 17
while (notFinished()) {
moveForward();
if (isPathRight()) {
turnRight();
}
}
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Video: if-else statements
Video 5: Saloni on the if/else block
http://learn.code.org/hoc/18
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Activities
Let’s try to solve puzzles 18-20!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Puzzle 20
while (notFinished()) {
if (isPathForward()) {
moveForward();
} else {
if (isPathRight()) {
turnRight();
} else {
turnLeft();
}
}
}
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Video: wrap up
Video 6: Wrap up
https://www.youtube.com/watch?v=98Wft30gUQE
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Wrap up (I)
We use instructions to write computer programs:
Basic instructions (moveForward, turnRight, turnLeft).
Flow control instructions:
Repeat (n) times
Repeat untill (something happens)
If
If - else
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
An hour of code
Wrap up (II)
Lessons learned:
One small mistake is enough for not getting to the end!
We always win ;) !
As with human languages, there are also many programming
languages.
It is better to think about the solution before start coding.
Want to learn more: 20 hour course at:
http://learn.code.org/
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
2
Online SocNs
3
Computer programming basics
4
Programming in R
Introduction
Basic instructions
Flow control instructions
5
Interacting with the provider
6
Data extraction project
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Introduction
R: introduction
Now, we do not want to move the (angry) bird around any more,
but to extract data from OSNs!
We are going to use the programming language R:
Data mining and graphics.
Interpreted language.
Uses packages / libraries to extend
its functionalities.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Basic instructions
R: basic instructions (I)
Variables:
x <- 5;
z <- "Hello World";
Arithmetic:
v <- x + 5;
v <- x * 5;
Show result:
x <- 5;
v <- x + 5;
print(v);
[1] 10
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Basic instructions
R: basic instructions (II)
Vectors:
x <- 1:10
print(x);
[1] 1 2 3 4 5 6 7 8 9 10
Load a library:
library(RCurl)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Basic instructions
R: basic instructions (III)
x <- 0:9
y <- 0:9 * 5
lm(y ~ x)
Coefficients:
(Intercept) x
-4.494e-15 5.000e+00
help(lm)
lm package:stats R Documentation
Fitting Linear Models
Description:
’lm’ is used to fit linear models. It can be used to carry out
regression, single stratum analysis of variance and analysis of
covariance (although ’aov’ may provide a more convenient interface
for these).
[...]
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Basic instructions
R: functions (I)
Functions
Functions are self contained modules of code that accomplish a
specific task.
Functions:
take in data
process it / do something
return a result
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Basic instructions
R: functions (II)
We have already seen some functions in action:
moveForward();
(no input, one step forward, no output).
print("hello");
(hello as input, prints content, does not return anything).
lm(y ~ x);
(x, y as inputs, computes de linear regression, returns the
coefficients of the regression as output).
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Basic instructions
R: data frames
name <- "Ford"
surname <- "Perfect"
df = data.frame(name, surname)
df
name surname
1 Ford Perfect
df$name
df$surname
[1] Ford
[1] Perfect
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Flow control instructions
R: flow control instructions (loops I)
for(i in 1:5)
{
print(paste("i =", i));
}
[1]
[1]
[1]
[1]
[1]
"i
"i
"i
"i
"i
=
=
=
=
=
1"
2"
3"
4"
5"
x<-1:5
for (i in 1:length(x))
{
print(x[i])
}
[1]
[1]
[1]
[1]
[1]
1
2
3
4
5
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Flow control instructions
R: flow control instructions (loops II)
x<-1:10
i<-1
while (i <= 5)
{
print(x[i])
i<-i+1
}
[1]
[1]
[1]
[1]
[1]
1
2
3
4
5
i<-0
repeat
{
i <- i + 1;
if(i > 5)
break;
print("Hi");
}
[1]
[1]
[1]
[1]
[1]
"Hi"
"Hi"
"Hi"
"Hi"
"Hi"
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Flow control instructions
R: flow control instructions (conditionals)
day <- 1;
if (day == 1)
{
x <- "WORKSHOP";
} else {
x <- "CONFERENCE";
}
print(x)
[1] "WORKSHOP"
day <- 2;
if (day == 1)
{
x <- "1st day WORKSHOP";
} else if ( day == 2) {
x <- "2nd day WORKSHOP";
} else {
x <- "CONFERENCE";
}
print(x)
[1] "2nd day WORKSHOP"
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Flow control instructions
R: activity
Write a program to print all odd numbers from 0 to 100 in R.
Write a program to print the sequence of Fibonnacci numbers
(1,1,2,3,5,8,13,...)
http://www.r-fiddle.org/#/
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Flow control instructions
R: activity (solutions to 1st problem)
num <- 0:100;
flag <- 0
for (i in 1:length(num))
{
if (flag == 0){
flag <- 1;
} else {
print(num[i]);
flag <- 0;
}
}
i<-1;
while (i < 100)
{
print(i);
i <- i + 2;
}
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
2
Online SocNs
3
Computer programming basics
4
Programming in R
5
Interacting with the provider
APIs
The Twitter APIs
The REST API
The Streaming API
Authentication
6
Data extraction project
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
APIs
APIs
API
An Application Programming Interface (API) specifies how
some software components should interact with each other.
Sometimes an API comes as a specification of remote calls exposed
to the API consumers.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
APIs
Popular APIs
APIs are everywhere:
Twitter
Flickr
Facebook
Google Maps
YouTube
We can use them to obtain data, but also to send data.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
The Twitter API
Twitter has two main APIs:
REST: ask for past information (6-9 days).
Streaming: filter real time information.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : search tweets (I)
GET search/tweets: Returns a collection of relevant Tweets
matching a specified query.
Query
EUSN
EUSN Barcelona
”online social networks”
european OR conference
”social network” -online
#EUSNConference
from:isidromj
Returns tweets...
containing the word ”EUSN”.
containing both ”EUSN” and ”Barcelona”.
containing the exact sentence.
containing either ”european” or ”conference” (or both).
containing ”social network” but not ”online”.
containing the hashtag ”EUSNConference”.
sent from user ”isidromj”.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : search tweets (II)
Query
to:isidromj
@isidromj
EUSN since:2010-12-27
EUSN until:2014-07-01
barcelona :)
barcelona :(
EUSN ?
EUSN filter:links
EUSN source:twitterfeed
Returns tweets...
sent to person ”techcrunch”.
referencing person ”mashable”.
containing ”EUSN” and sent since date ”2010-12-27”.
containing ”EUSN” and sent before the date ”2014-07-01”.
containing ”barcelona” and with a positive attitude.
containing ”barcelona” and with a negative attitude.
containing ”EUSN” and asking a question.
containing ”EUSN” and linking to URL.
containing ”EUSN” and entered via TwitterFeed.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : search tweets (III)
Try it yourself:
https://twitter.com/search-home
Can you write queries to obtain...?
Tweets containing the word Barcelona but not the word
football.
Tweets containing the word Barcelona, Paris or London.
Tweets containing the words Barcelona, Paris and London.
Tweets containing the exact sentence ”tomorrow i will”.
Tweets sent yesterday with the word holidays.
Tweets containing questions about paper submissions.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : search tweets (IV)
Note that you have used keywords like since, until, from or to
to build your queries.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : search tweets (V)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : other calls about Tweets
GET statuses/show/:id : Returns a single Tweet, specified by
the id parameter. The Tweet’s author will also be embedded
within the tweet.
GET statuses/retweets/:id : Returns a collection of the 100
most recent retweets of the tweet specified by the id parameter.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : calls about users
GET users/search: Provides a simple, relevance-based search
interface to public user accounts on Twitter. Try querying by
topical interest, full name, company name, location, or other
criteria.
GET account/settings: Returns settings (including current trend,
geo and sleep time information) for the authenticating user.
GET users/show : Returns a variety of information about the user
specified by the required user id or screen name parameter. The
author’s most recent Tweet will be returned inline when possible.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : calls about user relationships
GET friends/ids: Returns a cursored collection of user IDs for
every user the specified user is following (otherwise known as their
”friends”).
GET followers/ids : Returns a cursored collection of user IDs for
every user following the specified user.
GET friendships/incoming : Returns a collection of numeric IDs
for every user who has a pending request to follow the
authenticating user.
GET friendships/outgoing: Returns a collection of numeric IDs
for every protected user for whom the authenticating user has a
pending follow request.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter REST API : documentation
Have a look at the full specification:
https://dev.twitter.com/docs/api/1.1
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Twitter streaming API
We are going to use just one of the Twitter streaming APIs
(public stream) and one endpoint (statuses/filter). There are
many other alternatives!
The statuses/filter endpoint returns public statuses that match one
or more filter predicates though a single connection.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Streaming API: Follow
Follow allows us to specify data collection depending on user
information:
Tweets created by the user.
Tweets which are retweeted by the user.
Replies to any Tweet created by the user.
Retweets of any Tweet created by the user.
Manual replies, created without pressing a reply button.
The stream will not contain: Tweets mentioning the user, Manual
Retweets created without pressing a Retweet button and Tweets
by protected users.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Streaming API: Track
Track allows us to specify data collection depending on content:
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
The Twitter APIs
Streaming API: Location
Location allows us to specify data collection depending on
geographical position. Location uses the coordinates of the
bounding box to specify positions.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Authentication
About authentication
Authentication
Authentication is the process of determining whether someone or
something is, in fact, who or what it is declared to be.
We use usernames and passwords to authenticate ourselves
everywhere:
Email account.
Online banking services.
University virtual campus (moodle?).
Facebook account.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Authentication
OSN authentication
We also need to authenticate ourselves to use APIs:
Identify the application that is accessing the API.
Do things on behalf of other users.
What happens when Candy Crush wants to send a message to my
Facebook friends?
We do not want to give our username and password to others!
Solution: tickets (or tokens)!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Authentication
OAuth
OAuth
OAuth is an open standard for authorization that provides client
applications a ’secure delegated access’ to server resources on
behalf of a resource owner.
In other terms, OAuth allows users to give specific rights to
applications to act on their behalf:
Let Candy Crush send messages to my Facebook friends.
Let TwitterDeck send tweets from my account.
Let Facebook fetch my Gmail account contacts.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
1
Introduction
2
Online SocNs
3
Computer programming basics
4
Programming in R
5
Interacting with the provider
6
Data extraction project
Preparing the environment
Twitter Authentication
Let’s do it!
REST API
STREAMING API
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Download and Install R (I)
Go to http://cran.r-project.org/
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Download and Install R (II)
Windows users:
Download R for Windows
base
Download R 3.1.0 for Windows
Ubuntu / debian users:
sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update
sudo apt-get install r-base r-base-dev
Mac users:
Download R for (Mac) OS X
R-3.1.0-snowleopard.pkg
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
R console (Windows)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
R console (Linux)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Hello world!
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Download source code
Go to:
http://deic.uab.cat/~cperez/eusnworkshop/
http://tinyurl.com/EUSNdataextraction
Download and extract source-code.zip.
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Download source code
Content of source-code.zip:
credentials.R
hello-world.R
install-libraries.R
load-credentials.R
map.png
StreamR-example1.R
StreamR-example2.R
test-credentials.R
TwitterR-example1.R
TwitterR-example2.R
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Configure working directory
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Hello world (again)
(a)
(b)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Preparing the environment
Install libraries
# install required packages
install.packages("ROAuth");
install.packages("wordcloud");
install.packages("tm");
install.packages("twitteR");
install.packages("streamR");
install.packages("ggplot2");
install.packages("maps");
install.packages("ggmap");
source("install-libraries.R")
The downloaded binary packages are in
C: Documents and Settings packages
trying URL ’http://cran.es.r-project.org/bin/windows/contrib/3.0/maps_2.3-7.zip’
Content type ’application/zip’ length 2072327 bytes (2.0 Mb)
opened URL
downloaded 2.0 Mb
package ’maps’ successfully unpacked and MD5 sums checked
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter Authentication
Creating a new App (I)
1
Create a Twitter account:
Go to http://twitter.com
Click Sign Up for Twitter
Fill the form
2
Register a Twitter App:
Go to https://dev.twitter.com/apps
Create New App
Fill the form
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter Authentication
Creating a new App (II)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter Authentication
Get App keys
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter Authentication
Authorising the App
library(RCurl)
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
require(twitteR)
apiKey <- ""
apiSecret <- ""
reqURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
cred <- OAuthFactory$new(consumerKey=apiKey,consumerSecret=apiSecret,requestURL=reqURL,
accessURL=accessURL,authURL=authURL)
cred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
registerTwitterOAuth(cred)
# save credentials to file
save(cred, file="credentials.RData");
source("credentials.R")
Loading required package: bitops
Loading required package: twitteR
Loading required package: ROAuth
Loading required package: digest
Loading required package: rjson
To enable the connection, please direct your web browser to:
https://api.twitter.com/oauth/authorize?oauth_token=TJIcEiWs7wLVc4ujQy6UTj25F4UJlsq0JoyiZmOfUU
When complete, record the PIN given to you and provide it here: 7983356
>
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter Authentication
Get App keys
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Twitter Authentication
Testing the credentials
# load libraries
library(twitteR);
# load credentials
load("credentials.RData");
# OAuth register
registerTwitterOAuth(cred);
# get three tweets about hashtag #Obama and print some information
tweets <- searchTwitter("#Obama", n=2, lang="en");
for(tweet in tweets) {
# get one tweet and print some information
show(paste("User ’", tweet$getScreenName(), "’ says: ’", tweet$getText(), "’", sep=""));
}
File: test-credentials.R
source("test-credentials.R")
[1] "User ’claudianpliego’ says: ’RT @Matt_VanDyke: #Bush’s recklessness, #Obama’s
fecklessness leave U.S. looking weak as #Iraq crumbles
[1] "User ’roziedb’ says: ’RT @Politicule: #ISIS militants conduct public beheadings
in #Iraq. #Obama asks if he can play through while golfing in PalmSprings’
>
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 1: REST API (code)
library(twitteR);
source(’load-credentials.R’);
# search. Examples: #hastag, @user, etc
tweets <- searchTwitter("#Obama", n=10, lang="es");
# get only one tweet and analyse it
tweet <- tweets[[1]];
# show the low level structure of ’status’
show("STRUCTURE OF ’STATUS’ OBJECT:");
str(tweet);
# get some basic information
show(paste("TWEET ID:",tweet$getId()));
show(paste("TEXT:",tweet$getText()));
show(paste("USER NAME:",tweet$getScreenName()));
show(paste("IS RETWEET?:",tweet$isRetweet));
show(paste("RETWEETED:",tweet$retweeted));
# get information about the user
user <- getUser(tweet$getScreenName());
# print structure of ’user’
show("STRUCTURE OF ’USER’ OBJECT:");
str(user);
# get some information about the user
show(paste("USER ID:",user$getId()));
show(paste("USER NAME:",user$getName()));
show(paste("SCREEN NAME:",user$getScreenName()));
show(paste("LOCATION:",user$getLocation()));
show(paste("TWEETS NUMBER:",user$getStatusesCount()));
show(paste("FOLLOWERS:",user$getFollowersCount()));
show(paste("DESCRIPTION:",user$getDescription()));
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 1: REST API (output)
source("TwitteR-example-1.R")
[...]
text, favorited, favoriteCount, replyToSN, created, truncated, getCreated, getFavoriteCount,
getFavorited, getId, getIsRetweet, getLatitude, getLongitude, getReplyToSID, getReplyToSN,
getReplyToUID, getRetweetCount, getRetweeted, getRetweets, getScreenName, getStatusSource,
getText, getTruncated, getUrls, inidfialize, setCreated, setFavoriteCount, setFavorited,
setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID,
setRetweetCount, setRetweeted, setScreenName, setStatusSource, setText, setTruncated,
setUrls, toDataFrame, toDataFrame#twitterObj
[...]
[1] "TWEET ID: 483133728210694144"
[1] "TEXT: @jacklyn_ballard @_woahitsemily Nuestra lucha es por nuestra PATRIA SOBERANA,
por eso le hemos dicho #GoHomeGringosAsesinos #Obama"
[1] "USER NAME: rinconero43"
[1] "IS RETWEET?: FALSE"
[1] "RETWEETED: FALSE"
[1] "USER ID: 139597099"
[1] "USER NAME: @Rinconero #TROPA"
[1] "SCREEN NAME: rinconero43"
[1] "LOCATION: Algun lugar de Venezuela"
[1] "TWEETS NUMBER: 27858"
[1] "FOLLOWERS: 7791"
[1] "DESCRIPTION: Del libro rojo,el poder debe radicar en el PUEBLO,con participacion y
protagonismo... Sin miedo a denunciar a funcionarios INEPTOS y CORRUPTOS."
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 2: REST API (source code)
library(twitteR);
source(’load-credentials.R’);
# show favorites
show("FAVORITE TWEETS FOR USER wpmayor")
tweets <- favorites(user=’wpmayor’, n=3);
show(tweets);
# find trending topics by location
show("TRENDING TOPICS ASSOCIATED TO A LOCATION");
atl <- availableTrendLocations();
tweets <- getTrends(753692); # Barcelona
show(tweets);
# timelines
show("TIMELINE OF USER BaraackObama")
tweets <- userTimeline(user=’BarackObama’, n=3);
show(tweets);
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 2: REST API (output)
source("TwitteR-example-2.R")
[1] "FAVORITE TWEETS FOR USER wpmayor"
[[1]]
[1] "eddwp: Oh hey, checkout what just showed up: http://t.co/V8mFbeCWJN"
[[2]]
[1] "LisaSabinWilson: Love #WordPress? Good with front-end and theme dev? Want to work from home?
We’re hiring at @webdevstudios - http://t.co/O24viiLgid"
[[3]]
[1] "jchristopher: Wow @irontoiron is turning two _tomorrow_! Trying to grasp"
[1] "TRENDING TOPICS ASSOCIATED TO A LOCATION"
name url query woeid
1 Pinilla http://twitter.com/search?q=Pinilla Pinilla 753692
2 Vamos Colombia http://twitter.com/search?q=%22Vamos+Colombia%22 %22Vamos+Colombia%22 753692
3 Scolari http://twitter.com/search?q=Scolari Scolari 753692
4 Hulk http://twitter.com/search?q=Hulk Hulk 753692
5 Julio Cesar http://twitter.com/search?q=%22Julio+Cesar%22 %22Julio+Cesar%22 753692
[...]
[1] "TIMELINE OF USER BaraackObama"
[[1]]
[1] "BarackObama: Support @OFA today and keep fighting for change:
http://t.co/GRyYEpJzdO http://t.co/7xw6aQIusx"
[[2]]
[1] "BarackObama: Skills. cc #USMNT http://t.co/hzUHPsz3t9"
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 2: REST API (activity)
Try it yourself! Play around with...
tweets <- searchTwitter("#Obama", n=10, lang="en", since="2014-06-01", until="2014-07-03");
user <- getUser(tweet$getScreenName());
tweets <- favorites(user=’wpmayor’, n=3);
atl <- availableTrendLocations();
tweets <- getTrends(753692); # Barcelona
tweets <- userTimeline(user=’BarackObama’, n=3);
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 1: STREAMING API
# load packages
library(streamR);
# load credentials
source(’load-credentials.R’);
# connect to Twitter stream a get messages
filterStream("tweets.json", track = c("Obama", "Putin"), timeout = 60, oauth = cred);
# parse tweets
tweets.df <- parseTweets("tweets.json", simplify = TRUE);
# compute some measures
show(paste("Number of tweets with #Obama:", length(grep("Obama", tweets.df$text,
ignore.case = TRUE))));
show(paste("Number of tweets with #Putin:", length(grep("Putin", tweets.df$text,
ignore.case = TRUE))));
source("StreamR-example-1.R")
Capturing tweets...
Connection to Twitter stream was closed after 61 seconds with up to 101 tweets downloaded.
51 tweets have been parsed.
[1] "Number of tweets with #Obama: 34"
[1] "Number of tweets with #Putin: 5"
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 2: STREAMING API
# load packages
library(streamR);
library(ggmap);
# load credentials
source(’load-credentials.R’);
# get tweets from specified location
filterStream("tweetsSpain.json", locations = c(-9, 35, 4, 44), timeout = 60, oauth = cred);
# parse tweets
tweets.df <- parseTweets("tweetsSpain.json", verbose = FALSE);
# get points set (lon/lat)
points <- data.frame(x = as.numeric(tweets.df$lon), y = as.numeric(tweets.df$lat));
# get map
spain <- get_map(’Spain’, zoom=6);
spainMap <- ggmap(spain, extent=’device’, legend=’topleft’);
# save map to file
png(file=’map.png’, width=640, height=640, units=’px’, pointsize=12);
# map + points of lon/lat from tweets
print(spainMap + geom_point(aes(x = x, y = y), data = points, colour = ’red’, size = 1));
dev.off();
source("StreamR-example-2.R")
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 2: STREAMING API (output)
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
Example 2: REST API (activity)
Try it yourself! Play around with...
filterStream("tweets.json", track = c("Obama", "Putin"), timeout = 60, oauth = cred);
filterStream("tweetsSpain.json", locations = c(-9, 35, 4, 44), timeout = 60, oauth = cred);
Introduction Online SocNs Computer programming basics Programming in R Interacting with the provider Data extraction proje
Let’s do it!
The End
Thanks for attending!
[email protected]
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement