#### Tuesday, March 1st, 2011...10:29 am

## Celebrity Networking

An active area of my recent research has been applying ranking methods to social networks. A popular one that has publicly available network information is Twitter. For the non-tweeter, here is a Twitter tutorial in a few sentences. Twitter is a social networking service. Users tweet by sending a text-based post of up to 140 characters. Users of Twitter subscribe to other author’s tweets, by following, which directs future tweets from that author onto the user’s Twitter website or a compatible external application (like a smartphone). Wikipedia reports that Twitter has more than 100 million users worldwide.

The Twitter API allows requests for information about users of Twitter. In this blog entry, we’ll grab information about a group of users and determine who is following whom. In particular, we’ll consider the following Twitter users who you will recognize given their celebrity:

(Click the graphic to enlarge.) The names are listed in terms of the celebrity’s screen name, which can direct you to the associated Twitter webpage. For example, if you want to view Bill Gates’ page visit http://www.twitter.com/billgates. Note that you don’t need a Twitter account to view this webpage.

Now, we will use the Twitter API to determine which of these celebrities are following each other. We’ll represent this as a digraph. Each celebrity user will be a vertex. An edge will go from celebrity A to celebrity B if A follows B. The Matlab code we will use will create an adjacency matrix for such a graph. In the matrix *G*, the *ij* element of the matrix equals 1 if there is an edge from vertex *i* to vertex *j* and 0 otherwise.

How do we create such an adjacency matrix? I wrote the following Matlab code to do just such a thing.

%% twitterMatrix creates the adjacency matrix of a network of users

% on Twitter.

% Author: Tim Chartier, Davidson College

% Date: February 28, 2011%% Input the screen names of the users you wish to form adjancey matrix of.

screenName = {‘billgates’,’jimmyfallon’,’kimkardashian’,…

‘paulaabdul’,’ryanseacrest’,’theellenshow’};%% Use the Twitter API to grab the user IDs for the inputted screen names

ids = zeros(length(screenName),1);

numberUsers = length(screenName);for i=1:numberUsers

url2visit = strcat(‘http://api.twitter.com/1/users/show.json?screen_name=’,char(screenName(i)));urlString = urlread(url2visit);

indicesInString = strfind(urlString,'”id”:’);

idString = strtok(urlString(indicesInString(end)+5:end),’,’);ids(i) = str2num(idString);

end%% Construct Twitter adjacency matrix, T

T = zeros(numberUsers,numberUsers);

for i=1:numberUsers

friendsString = urlread(strcat(‘http://api.twitter.com/1/friends/ids.json?user_id=’,num2str(ids(i))));

eval(strcat(‘F = ‘, friendsString,’;’));

for j=1:numberUsers

if (length(find(F == ids(j))) > 0)

T(i,j) = 1; % user i follows user j

else

T(i,j) = 0; % user i does not follow user j

end

end

endfprintf(‘Adjacency matrixn’);

T

Using this code, I found (remember this is pulling real-time data from Twitter) the following adjacency matrix:

where the first through sixth rows of the matrix correspond to the users billgates, jimmyfallon, kimkardashian, paulaabdul, ryanseacrest, and theellenshow, respectively. Therefore, we see from row 1 that billgates follows only ryanseacrest. Transforming this matrix, we have the following graph:

Note that the Matlab code given above allows you to input screen names of your choice. Some users follow so many users that it leads to a runtime error. For instance, BarackObama follows over 700,000 users. This causes an error. However, theEllenShow follows over 43,000 which does not cause an error. This code is not error-free. It was created within the last week for my students. Several friends have expressed interest so I’m blogging simply as a way of sharing the resource.

What can you do now? You can really ask a variety of questions that we ask about webpages or graphs. What is the shortest path from one user to another? This would give the minimum number of retweets within this small network of users needed for information to travel between those 2 users. How about finding the PageRank of the users? Each screen name represents a Twitter webpage which does contain a link to the other user’s Twitter webpage. Here is additional Matlab code for this:

%% Find PageRank of network by finding dominant evec

n = length(T);

G = T’;p = 0.85;

M = (1-p)/n*ones(n,n);

for i = 1:n;

colSum = sum(G(:,i));

if (colSum ~= 0)

M(:,i) = p*(1/colSum*G(:,i))+(1-p)/n*ones(n,1);

else

M(:,i) = 1/n*ones(n,1);

end

end[V,D] = eig(M);

diagD = diag(D);

[y,i] = max(diagD);

pageRank = V(:,i);

pageRank = pageRank/sum(pageRank);%% Give results in bar chart

bar(pageRank)

title(‘PageRank’);

axis tight[sortedRanking, indices] = sort(pageRank,’descend’);

fprintf(‘n****************************n’)

fprintf(‘ PageRank resultsn’)

fprintf(‘****************************n’)

fprintf(‘%15s PageRankn’,’Screen Name’)

fprintf(‘—————————-n’)

for i=1:length(sortedRanking)

fprintf(‘%15s %6.4f n’,char(screenName(indices(i))),sortedRanking(i));

end

fprintf(‘nn’)

For this network, this produces the results:

Celebrity | PageRank |
---|---|

ryanseacrest | 0.3544 |

jimmyfallon | 0.1526 |

kimkardashian | 0.1503 |

theellenshow | 0.1285 |

paulaabdul | 0.1071 |

billgates | 0.1071 |

Interesting? Another question is whether such a model appropriately models a social network. It has been very successful with webpages, presumably reflecting its relative accuracy in modeling the internet. Is Twitter different? What’s your opinion? If you teach, what do your students think? In fact, you could present such a graph to students who have studied graphs and ask how their knowledge of graphs could apply and yield information related to this social network.

Enjoy the code!