Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

tl;dr See this movie visualization for a case study on how a post propagates through Quora.

How does information spread through a network? Much of Quora’s appeal, after all, lies in its social graph – and when you’ve got a network of users, all broadcasting their activities to their neighbors, information can cascade in multiple ways. How do these social designs affect which users see what?

Think, for example, of what happens when your kid learns a new slang word at school. He doesn’t confine his use of the word to McKinley Elementary’s particular boundaries, between the times of 9-3pm – he introduces it to his friends from other schools at soccer practice as well. A couple months later, he even says it at home for the first time; you like the word so much, you then start using it at work. Eventually, Justin Bieber uses the word in a song, at which point the word’s popularity really starts to explode.

So how does information propagate through a social network? What types of people does an answer on Quora reach, and how does it reach them? (Do users discover new answers individually, or are hubs of connectors more key?) How does the activity of a post on Quora rise and fall? (Submissions on other sites have limited lifetimes, fading into obscurity soon after an initial spike; how does that change when users are connected and every upvote can revive a post for someone else’s eyes?)

(I looked at Quora since I had some data from there already available, but I hope the lessons should be fairly applicable in general, to other social networks like Facebook, Twitter, and LinkedIn as well.)

To give an initial answer to some of these questions, I dug into one of my more popular posts, on a layman’s introduction to random forests.

Before looking deeper into the voting dynamics of the post, let’s first get some background on what kinds of users the answer reached.

Here’s a graph of the topics that question upvoters follow. (Each node is a topic, and every time upvoter X follows both topics A and B, I add an edge between A and B.)

Upvoters' Topics - Labeled

We can see from the graph that upvoters tend to be interested in three kinds of topics:

  • Machine learning and other technical matters (the green cluster): Classification, Data Mining, Big Data, Information Retrieval, Analytics, Probability, Support Vector Machines, R, Data Science, …
  • Startups/Silicon Valley (the red cluster): Facebook, Lean Startups, Investing, Seed Funding, Angel Investing, Technology Trends, Product Managment, Silicon Valley Mergers and Acquisitions, Asana, Social Games, Quora, Mark Zuckerberg, User Experience, Founders and Entrepreneurs, …
  • General Intellectual Topics (the purple cluster): TED, Science, Book Recommendations, Philosophy, Politics, Self-Improvement, Travel, Life Hacks, …

Also, here’s the network of the upvoters themselves (there’s an edge between users A and B if A follows B):

Upvote Network - Unlabeled

Upvote Network - Labeled

We can see three main clusters of users:

  • A large group in green centered around a lot of power users and Quora employees.
  • A machine learning group of folks in orange centered around people like Oliver Grisel, Christian Langreiter, and Joseph Turian.
  • A group of people following me, in purple.
  • Plus some smaller clusters in blue and yellow. (There were also a bunch of isolated users, connected to no one, that I filtered out of the picture.)

Digging into how these topic and user graphs are related:

  • The orange cluster of users is more heavily into machine learning: 79% of users in that cluster follow more green topics (machine learning and technical topics) than red and purple topics (startups and general intellectual matters).
  • The green cluster of users is reversed: 77% of users follow more of the red and purple clusters of topics (on startups and general intellectual matters) than machine learning and technical topics.

More interestingly, though, we can ask: how do the connections between upvoters relate to the way the post spread?

So let’s take a look. Here’s a visualization I made of upvotes on my answer across time (click here for a larger view).