Swarm Creativity Blog

Nov 27

Will the EURO break up?

With the crisis in the Eurozone approaching its climax, I was curious to read the collective mind. On the Web, in the blogosphere, and on Twitter there is a lot of buzz about Eurozone breakup or survival.

I decided to ask both the swarm (through blogs) and experts (through News Web sites) as well as the crowd (through Twitter), using our Condor coolhunting tool.
It turns out, swarm and experts think the Euro will survive intact - albeit by quite a slim margin

The picture above shows the Blog/Web site network, with the two search terms weighted by the importance (betweenness centrality) of the bloggers and Web sites. The bloggers/Web sites vote 52% for Eurozone survival, and 48% for Eurozone breakup.

The crowd, measured through the tweeters , believes the opposite. The picture below shows the snapshot of today (11/27/2011) of the retweets about “euro survive” and “euro breakup”.

The crowd on Twitter votes only 33% for Eurozone survival, with a decisive 67% of the vote for Eurozone breakup.

The question now is: whom to trust? The crowd is fickle, and the wisdom of crowds easily flips to madness, while the swarm usually has a much better grasp of what the future might be bringing. So perhaps it’s not as bleak for the EURO, as everybody thinks?

What do the Wikipedians think about the Euro?
As an additional expert opinion, I also checked, using our new Wikimaps tool, what the Wikipedians think about the EURO, exploiting the hidden link structure in Wikipedia. I ranked the links by two different algorithms: (1) by the numbers of links and backlinks, and by (2) actuality, i.e. freshness of the edits.
As the two pictures below show, the link-network looks very different for the two rating-algorithms:

Just looking at the Wikipedia linking structure (top picture) puts the different coins and currencies making up the Euro closest. While the economy of Europe is important for both networks, in the actuality picture (bottom network) the economy of Greece and Portugal, Frankfurt, the European Central Bank, the International Monetary Fund, and Angela Merkel suddenly become key players.

Nov 24

“What is not good for the swarm is not good for the bee.” — Marcus Aurelius

Nov 20

Occupy Wallstreet battling TeaParty – Divided they tweet!

Today (11/20/11) I ran a Condor twitter analysis for #ows (the Occupy Wallstreet Twitter tag) and #teaparty (the Tea Party Twitter tag), trying to predict public sentiment for these two social movements.
I only collected retweets, and constructed the retweet-network, measuring the importance of people retweeting based on their social network position. The picture below shows the resulting network, each dot is a twitterer, each line is one or more retweets. Surprisingly we get three clear clusters, a Occupy Wallstreet cluster (blue, at the bottom), a Tea Party cluster (yellow, in the center) and a mixed cluster at the top. Red dots are people tweeting both about #ows and #teaparty.

A closer look at these three clusters tells us that the blue cluster is Occupy Wallstreet sympathizers talking about issues near and dear to them, the yellow cluster is Tea-Party sympathizers doing the same about their cause, while the mixed cluster at the top consists of Occupy Wallstreet sympathizers badmouthing the Tea Party, and Tea Party sympathizers lambasting Occupy Wallstreet and Barack Obama.

Aggregating the network, and weighing the tweet of each twitterer with her/his social network position, lead to 55% of weighted votes for Occupy Wallstreet, and 45% for the TeaParty. The results are clear: Occupy wall street sympathizers carry more weight in the Twittersphere than Tea Party members – the question of course remains how representative this is for the rest of the American population.

I then also checked positivity and negativity of tweets. Again I was in for a surprise. Usually human beings are optimists, and positivity is much larger than negativity. But not so here, for both Tea Party and Occupy Wallstreet tweets negativity was about two times bigger than positivity. In an additional twist, the (mostly negative) tweets about the Tea Party were more positive than the tweets about Occupy Wallstreet (see picture below).

The first conclusion of this chart is the general unhappiness with the current political situation. While both Tea Party and Occupy Wallstreet sympathizers are very unhappy, Tea Party twitterers are slightly happier, although they seem to carry less political weight.
At last I looked at what the key issues of the Occupy Wallstreet discussion today were, collecting the most recent blog posts with Condor (see semantic network picture below).

While the Tea Party members rejoice about the booing of Michelle Obama and Joe Biden at the Nascar race in Florida, the Occupy Wallstreet sympathizers lambast Mayor Bloomberg for his lifestyle and the closing of Zuccotti Park. Religion is quite central - as expected - for the Tea Party sympathizers, while a large part of the discussion is focussed on the presidential candidates.

Nov 15

What creative swarms can learn from the bees

Last Friday night I had a great discussion with Billie Bivins, host of the show “Make Art…Feel Better” at the Belmont Media Center about creative swarming and the bees. She even got me to cobble together my own bee. Here is the link to the resulting video. Very cool.

Jul 19

Wikimaps Revised

The first version of the Wikimaps Page (http://bit.ly/Wiki-Map-Project) that we published a couple of weeks ago helped to visualize the basic idea of Wikimaps. It consists of an interactive animation that allows visitors to visually track the changes in Wikipedia articles over a given time period. Real world activities and events are reflected in updates of the respective articles and the links between them.

Rise and Fall (of Swiss Tennis Star Hingis) on Wikimaps
A good example is the retirement of the Swiss tennis player Martina Hingis. While her page (node) is still well connected to the network in 2008 (roughly a year after she retired), the page is not listed in the network anymore after February 2010. It is important to note that this view of the network is filtered and only displays nodes that “survive the cut”: The page of the former number one ranked player is still there and has many links pointing to it, just not enough to appear in this filtered “most important pages” view.

Another example is the case of the former president of the International Monetary Found, Dominique Strauss-Kahn. We tracked changes in related pages for a time span of approximately 8 months and built a network with weekly snapshots. On May 14th 2011, Strauss-Kahn was arrested in New York City, this event lead to an spike in the activities in the network surrounding the page of Strauss-Kahn. Interestingly enough the increased activities that lead to this spike were not solely based on pages directly related to the arrest. The attention lead to a general increase of activities on related pages.

Watch a video of the changes in the Dominique Strass-Kahn graph:


The following graph shows the spike in activities in the graph around the 14th of May.

(Activity in a network is defined as the sum of additions and deletions of nodes within a given time frame)

Although we think this first visualization is already pretty cool, the results did not really surprise anyone. The data that was initially used was very static. We simply picked (seemingly related) categories and selected the pages that had the highest indegree values. Pages that would be “close” or relevant but not members of the selected categories would never show up in the graph.

To mitigate the shortcomings of this approach we decided to change our approach for the collection of the pages that would be considered candidates for the graph. The most promising idea was and still is, a combination of weighted components, possibly applied in multiple iterations. Or as we call it, a Filtered Breadth First Search.

Effective Filtering is Key
One of the challenges of working with the Wikipedia graph is the size of it. An optimal algorithm should therefore handle the trade-off between maintaining a small sub-graph while still returning meaningful results. A naively executed BFS would quickly lead to an explosion of articles that would have to be considered. To prevent this we only follow edges (links) that are considered interesting or relevant. The decision whether to follow a link during the execution of the search is based on a weighted mix of the following metrics:
● Local Indegree
● Global Indegree
● Number of recent page edits
● Reciprocal Links to source page
● Shortest Path Distance to Source Page
● Wikipedia Full-text search results

Naive Degree-Based Filtering leads to “boring” results
It would be a lot easier to simply include pages based on a single metric, namely the one that is the least expensive and seemingly a very meaningful one: The (local) Indegree, the number of pages that link to a certain page. The problem is, that this metric strongly favors so called hub-pages, these are pages that are linked to a lot altough they are semantically not directly related. Typical examples are pages for certain dates or countries. There were ideas to filter these pages using blacklists or to work with an Indegree-Band (as opposed to a lower limit).

These two ideas to however turned out to be very tedious and error-prone. We further believe that the most relevant results can only be found by a cleverly tuned combination of many factors.

Outlook
There is another network on wikipedia besides the one that based on articles and links. It’s the network of the Wikipedia authors and their collaborations. We anticipate that the incorporation of these informations will additionally improve the relevance of the nodes in a Wikimap network. Read this previous blog post for an explanation of the basic idea.

posted by Reto Kleeb

Jun 30

Wikimaps: Dynamic Maps of Knowledge

Wikipedia does not only provide the digital world with a vast amount of high quality information, it also opens up new opportunities to investigate the processes that lie behind the creation of the content as well as the relations between knowledge domains.

In their daily work Wikipedia editors make sure to keep articles updated: Natural disasters, shiny new pop icons and scandals are reflected in new articles or in links between them. But how do these pages and their links evolve over time? Can we visually track how ties between subject-areas grow stronger, is there a way to notice that an article becomes more influential?

Our first attempt to come up with an answer to these questions was the development of a visualization that renders pages as nodes of a graph. If there is a link between two pages, the corresponding links are represented as an edge. Each graph represents a snapshot of the articles at a specific date, the slider and the video controls on the left allow you to navigate back and forth in time.

http://bit.ly/Wiki-Map-Project
Try it out: Scroll to zoom in and out, use the video controls to start and pause the animation or drag to slider to any point in time.

Selection of the Nodes
There are currently 3,6 Million articles in the English Wikipedia and displaying nodes for all of them at the same time does barely make sense. For our first prototype we decided to display a subset of the 50 most important nodes out of a given data-set.

How do we define importance? We decided to select the top nodes by using their indegree value - the number of links that point to an article, a trivial way to measure basic influence and relevance. The data-sets that are used, are based on related categories on Wikipedia e.g. to look at modern Musical groups we look at all the members of the categories “Musical groups established in 1990”, “Musical groups established in 1991” and so forth.

Collecting the necessary data is a time consuming process. The usual approach for doing network analysis on Wikipedia is to use complete database dumps that are provided by the Wikipedia foundation. The problem with these dumps is that they are either very large (complete dump that contains all historical data: 5 TB) or do not provide a high enough date resolution to accurately track the development of current events. To get around these issues we developed a data fetcher that uses the HTTP API. It continuously collects and stores the minimal amount of information that we need to build link-networks for a selected list of articles with the desired date resolution.

Future Work
Looking at the changes in the graph over time, it becomes clear that the simple indegree criterion does suffer from some shortcomings. It does not work to discover (fast) rising subjects. Or speaking figuratively: Despite the attention they currently receive, Lady Gaga and Justin Bieber do not stand a chance against Madonna or Eric Clapton. While one might claim that this situation is perfectly justified and reflects their artistic contributions, it would still be interesting to develop a set of metrics to select and rank nodes based on short term spikes in interest or relevance.
posted by Reto Kleeb

Apr 17

The US – a Loophole Society – or a Society of Trust?

My immersion into the loophole society concept took place in 2007 when I was bringing used computers to Ghana, to be donated to schools. While the total value of the computers was about $1200, getting them through Ghanaian customs took two weeks and cost me another $1200. I had to hire an agent, who was a relative of the headmaster at the receiving school, who expected to be paid $200 to shepherd me through the myriad customs clearance offices. This customs process, designed to plug customs loopholes for importers, doubled the costs of the goods. However when I had delivered the computers I found out that I could have bought the same computers for about $1200 on the public Makola market in Accra – so it seems clever people always find ways to exploit the loopholes.

It is my perception that the loophole society concept is not restricted to African countries. Even the US has become more and more a society where people exploiting loopholes are rewarded and admired. Last week we learned that, by clever exploitation of tax loopholes, GE had 10.8 billion of profits, but a tax bill of $0 for 2009. The loophole phenomenon however is by no means restricted to big companies, but trickles down to individuals looking for loopholes to get a little break in dealing with others.

For me, the culture of loopholes, as compared to a culture of trust, is based on small worlds, or more precisely, the lack of small worlds. In a society with a small world structure where everybody knows everybody, loopholes have little chance. Exploiting loopholes is replaced by a culture of trust. The smaller the “world”, the more people value their reputation and their social capital and therefore don’t dare exploiting loopholes.

I learned about the differences between “small worlds” – engendering trust, and the “big world” encouraging exploitation of loopholes recently when I was attending a meeting of the Swiss-American chamber of commerce. A frustrated Swiss businessman – coming from a very small world – bitterly complained about the 500 page contract that the lawyers of his US business partners wanted him to sign. As he said, in Switzerland business contracts are still one or two pages, containing the key points of the business deals, and not 500 pages of provisions trying to plug every possible loophole. Because, as he said, if something goes wrong, instead of trying to resolve the issue, lawyers from both parties will start pouring over the 500 pages, and try to find the loopholes in their favor. This is great news for the lawyers, as it keeps them happily employed. It is not so great news for the Swiss business owner, because he will have to spend most of his profits, and then some, for the fees of his American lawyers.

Doing sponsored research in both the US and Switzerland gives another opportunity to compare the loophole society with the trust society. Research dollars spent at a top US university carry an overhead of 70%. This compares to an overhead rate of 15% in Switzerland. This means, that out of every US research dollar, 70 cents are spent on internal university administration, whose main task it is to make sure that the other 30 cents are not squandered. Compare this to the overhead at the Swiss university, where 15 cents on every Swiss Franc are spent on oversight and administration, and the remaining 85 cents on the researchers.

While the last two examples are somewhat oversimplified, they nevertheless illustrate a larger trend. The point really is that we should be moving towards a society of trust, and not a society of exploiting loopholes. This means that we should try to create localized small worlds based on self-organization and trust, where individuals are trusted to do the “right thing”, but are also held accountable for their own actions.

Apr 16

Might growing health care costs be a good thing?

Everybody is complaining about the ever-rising costs of health care. But could it be that this is actually a good thing, because it means we can afford to spend an ever-rising share of our dispensable income on our health?
While there is undoubtedly some misuse of our healthcare dollars, and money is wasted on unnecessary beauty operations, or even worse, on lawyers filing malpractice suits, I think that the overall fraction of dispensable income a society can afford to spend on healthcare is a good benchmark for gross national happiness.
There are many variables influencing happiness, such as income, being married, and age, but being in good health has been found to be one of the most reliable predictors of happiness, as has been shown by many researchers. Countries which are able to spend a large amount of their income on healthcare should therefore be happier.

Does national happiness and healthcare spending indeed correlate? Because I could not find statistics, I did a quick calculation myself. I looked up mean health care spending per head in PPPS (purchasing parity adjusted dollars) of the OECD countries in 2001. I then compared these numbers to the gross national happiness index as listed on the World Database of Happiness. As a control variable in my model I took country size, looking up the population numbers on Wikipedia. Below are the actual numbers, showing that the US and Switzerland are the record spenders on healthcare per head, but are also fairly happy, although small countries like Denmark, Iceland, and Luxembourg are even happier, while spending less money on healthcare.

When I did a linear multivariate regression with these numbers, using health spending per head and country size as independent variables, and happiness as the dependent variable, I found an adjusted R squared of 0.58, with standardized significant coefficients of 0.83** for health spending per head, and -0.38** for population size. To put this in simple language, this means that 58% of the happiness of a country is explained by the health care spending and the country’s size. The more a country spends on individual health care, and the smaller the country is, the happier its inhabitants are.

What’s the conclusion for the US? Well, this means investing money in health care actually might not be such a bad thing, but please, allow for local autonomy, giving subgroups of the population a say on how the money is being spent.

Mar 05

Prediction Market predicted Oscars correctly 11 out of 12 times

I just stumbled on this interesting Blog post which compared the predictive quality of the Intrade prediction market to correctly predict this and last year’s Oscars. It seems the market picked the winner correctly 11 out of 12 times.
Also interesting is the comment by BarTaxCa on the post, noting that depending on which prediction market one picks (HSX, Intrade, Inkling market) prediction differs. So it seems there is still a role to play for analyzing the wisdom of swarms through their Web buzz on IMDB and Rottentomatoes. In fact, what we found is that throwing the two together (prediction market + Web buzz) leads to the best results.

Jan 17

Facebook Pages, and why we know that you probably like Lady Gaga.

The Idea
Ever since Facebook rolled out pages in 2007, it has become very easy for users to show their interest in music, film, books, artists and other entities in various categories by clicking the “like” button on a specific facebook page. Most of the time, the information about your personal “likes” is not protected automatically and therefore can therefore accessed by everyone, even if not logged in.
We know that Mark Zuckerberg likes the Yankees and is a fan of Jay-Z, but that might just be of interest to his friends or People magazine. But there is much more information that we can infer from the social graph. Can Barack Obama know about the preferred beverages or favorite books of his fans? He can! …but he probably doesn’ t care. With the information provided by Facebook’s social graph it is easy to identify connections between books, films or brands - without conducting a survey.

The Data
Building a network by linking two pages, depending on the frequency of their occurrence on the same user profile produces graphs like the following.










The fact that people are providing this rich information creates different opportunities for analysis. Surely Facebook is already taking advantage of their data, but in social science and marketing user behaviour could be analysed. Certainly the advertising industry could benefit from, and would pay money for, such demographic information.
Demo Prototype
This web application illustrates a potential use of the data, which is based on 20 000 public Facebook profiles from different countries. An underlying bipartite “user to page” relation is used as a data source.



You can navigate through the TagCloud by clicking on a random entity. Different colors indicate categories (film, books, music, interests, other). The average of other pages listed in categories for the current page can be seen in the middle graph. The last graph shows the relative percentage of users liking this page in different countries.
It gives you a broad idea of the structure, though the current data is not representative of all Facebook users as the data was crawled from just 8 countries.

The key findings from this visualization:
So, if you want to stand out among those 500 million Facebook users, just don’t like Lady Gaga, Michael Jackson or Barack Obama.

Facebook Pages - Categories

The chart below shows how categories of Facebook pages are used in different countries.
On average, users from Great Britain and the United States list twice as many pages in their profiles than users from Brazil. Furthermore, differences in certain categories can be identified. Listing books or activities seems to be very unpopular, in contrast to pages in the music or TV categories.