Taming the Web
By Shawn Fuller
To understand what is happening right now we need understand how the world wide web has changed from when it was first invented by Tim Berners-Lee in 1989. The web we see now is often called web 2.0 to mark this evolution. This transition was made possible by the pervasive use of computer algorithms. Prior to the advent of these algorithms, we explored the web like some new land. Now, in a very real way the web has begun to explore us.
The web burgeoned in the 1990s since it cost almost nothing to publish content with the already existing tools for word processing, image editing, and digital video. During this period, often called web 1.0, users had to actively search for content. There were plenty of options for active discussion and participation, especially for the web-savvy. But if you wanted to be self-supporting as a blogger or seller, it was very hard to build a large audience. Very few were lucky enough to attract many people to their site when it was just one of millions of others out there. The difficulty of attracting visitors was one of the factors leading to the dot-com bust in the late 1990s. Start-ups burned through their investment capital by spending it on newspaper, television, and radio ads, in order to drive people to their sites, so that they could make money through presenting even more ads.
YouTube solved this problem by making it easier for video makers and viewers to find each other. They did this by applying different classes of algorithms to increase the likelihood that visitors to YouTube would find videos that would interest them. This engagement model kept people on the site so that more ads could be presented to them. These algorithms also changed the flow of information. Web 1.0 was becoming Web 2.0. We no longer had to go out and search for content. Content was searching for us and arriving in front of us in a way that felt more like the familiar passive media consumption of the past but tailored to our personal interests.
What is an algorithm?
In its simplest form an algorithm is any set of steps that, if followed, will accomplish a goal. For instance, a recipe is an algorithm that a human follows to make a meal.
The traditional computer algorithm should provide the same answer or do the same thing each time it is given the same input. Increasingly tech companies use machine learning algorithms for many of their systems. Algorithms developed by machine learning don’t follow an ordered set of steps. Instead, they work more like the perceptual systems of the human brain, which processes visual input through successive layers of neurons, each layer specialized to extract different kinds of information. The software that supports machine learning mimics the interconnected neurons, whose behaviour changes with learning. The algorithms of social media take in information about our behaviours, make decisions based on their own hidden learning, and feed back to us the results.
Hannah Fry summarizes the four major categories of algorithms that are used by Google, YouTube, and Facebook in her book, Hello World: Being Human in the Age of Algorithms. They are prioritization, association, classification, and filtering.
Prioritization algorithms rank things according to criteria such as popularity, or ratings. When you search YouTube for videos of hurricanes, cats, TV bloopers, or how to repair your dishwasher it uses prioritization algorithms to bring you the most popular videos, ranked according to the number of times a video has been viewed by other people. It also displays videos that you may also like to watch, based on their association with other videos that people have viewed.
Association algorithms find connections between things. They are what Amazon uses to display other books and products that you may be interested in based on what you just searched for. It is the association algorithms that may create a radicalization pipeline in which the recommendation engine offers increasingly extreme videos. The media scholar Zeynep Tufekci observed that after watching a number of videos of Donald Trump rallies on YouTube, the site began to autoplay videos featuring white supremacist rants and Holocaust denials. When she started watching videos of Hillary Clinton and Bernie Sanders, YouTube started recommending left wing conspiratorial videos filled with allegations of secret government agencies and 9/11 cover-ups. In response to public outcry and media attention over ISIS and extremist videos, YouTube began removing videos that directly preached hate or incite violence. They also adjusted their ranking algorithms to de-amplify those that promote conspiracy theories and pseudoscience. Now, if you look up chloroquine and Covid 19, for example, YouTube will use its association algorithms to connect you to videos by medical researchers, who provide a more nuanced view of the topic.
Classification algorithms attempt to place you in various categories. The massive data harvesting that social media applications and data broker companies engage in is aimed at placing you in demographic and behavioral categories in order to target ads for products that might interest you. It may comfort you to know that these algorithmic guesses are not entirely accurate (yet). Based on my signed in activity Google thinks I am not a parent. Meanwhile, my daughter’s google account thinks she is interested in cars and basketball (neither is true).
Filtering algorithms remove or exclude information that is considered noise or not of interest. Siri and Alexa need to filter out the background noise in order to recognize your voice commands. Similarly, your phone filters out background noise while you are speaking to someone. This sometimes leads to the perception that your call has been disconnected during a pause in the conversation. (“hello? Are you still there?” “Yes, I’m still here”) The social media apps use filtering algorithms to include only the stories, memes, and videos that match your known interests.
Algorithms and Content Aggregation
The Web 1.0 was a static network of websites, search engines, the pre-web UseNet groups and AOL conferencing. It resembled a vast library where you would find things of interest through the search engine and retrieve the best content brought back from search results. Web 2.0 companies stayed away from generating any content, virtual reality or otherwise, and focussed their attention on the network, itself. The user experience of Facebook, Twitter, and YouTube were pared down to simple scrolling web pages. All content came from the users—or the advertisers. All the money, the millions of dollars in investment capital, was spent on algorithms that enhanced social networking.
In his Wired article, Steven Levy draws from a 17-page chunk from the (mostly destroyed) journals of Mark Zuckerberg to examine a key point in the evolution of Facebook. One of the preoccupations within this journal was a product that he called Feed (later named News Feed). This signalled a dramatic change to the Facebook concept, which would later have real-life impacts on global events in war and in politics. At the beginning, Facebook required more active effort on the part of users to connect with friends. You had to go to your friends’ profiles in order to see what they were up to and post your updates on their “walls”. This was very much in line with the active participation that defined the web 1.0 world. The news feed changed all of that.
Now your friends’ updates were pushed to you without any effort on your part. The news feed would make it easy for people to see what was important among the friends with whom they were connected. Steven Levy notes that Zuckerberg used the word “interesting-ness” to cover the kinds of stories that should appear in a news feed. They needed to be “centered around your social circle.” There was to be an emphasis on changes in personal relationships and life events. The new feed was to favour these over interesting events or other information outside your social circle.
The news feed when it launched in 2006 appeared to be disastrous since it had major privacy flaws and people protested heavily against it. But it was also clear, once Facebook employees looked at the data, that people were spending more time on Facebook. It was the news feeds themselves that helped the protests to go viral. Facebook fixed the more serious privacy issues, the protests died down, and Facebook continued to grow exponentially.
The Advertising Model
The social media apps emphasized personal expression and personal connection. They developed classification and association algorithms to find friends and people for users to connect to. The rougher and less polished the content that you posted the more authentic you appeared to friends and followers. The recommendation engines brought the videos, memes, stories, and comments that would most appeal to you. The highly social nature of the apps and their ease of use attracted millions of users. And from these millions of users flowed massive amounts of content. The algorithms were needed to manage this content.
Facebook constructs a picture of you, starting with the biographical information that you supply—where you live, work, and play. Even more importantly, Facebook has a record of your social network graph. Who do you know? What is your social milieu? Is it primarily middle class or poor, urban or rural? Are you conservative or liberal? What do you believe in? All of this information can be gathered through what you post, share, and like. Advertisers uses this data to target ads to likely buyers. This data is fed into filtering and prioritization algorithms so that they surface the stories from your friend and follower networks that are mostly likely to keep you liking and sharing. The longer you stay glued to your phone the more ads you will encounter.
The mission of First Draft is to protect communities from harmful misinformation. Through their Cross Check program, they work with a global network of journalists to investigate and verify emerging news stories. The site has many research articles, education, and guidelines on misinformation and infodemics.
Data & Society
Data & Society studies the social implications of data-centric technologies & automation. It has a wealth of information and articles on social media and other important topics of the digital age.
Stanford Internet Observatory
The Stanford Internet Observatory is a cross-disciplinary program of research, teaching and policy engagement for the study of abuse in current information technologies, with a focus on social media.
Sinan Aral is the David Austin Professor of Management, IT, Marketing and Data Science at MIT, Director of the MIT Initiative on the Digital Economy (IDE) and a founding partner at Manifest Capital. He has done extensive research on the social and economic impacts of the digital economy, artificial intelligence, machine learning, natural language processing, social technologies like digital social networks.
Renée DiResta is the technical research manager at Stanford Internet Observatory, a cross-disciplinary program of research, teaching and policy engagement for the study of abuse in current information technologies. Renee investigates the spread of malign narratives across social networks and assists policymakers in devising responses to the problem. Renee has studied influence operations and computational propaganda in the context of pseudoscience conspiracies, terrorist activity, and state-sponsored information warfare, and has advised Congress, the State Department, and other academic, civil society, and business organizations on the topic. At the behest of SSCI, she led one of the two research teams that produced comprehensive assessments of the Internet Research Agency’s and GRU’s influence operations targeting the U.S. from 2014-2018.
The Internet’s Original Sin
Renee DiResta walks shows how the business models of the internet companies led to platforms that were designed for propaganda
“Computational Propaganda: If You Make It Trend, You Make It True”
The Yale Review
Zeynep is an associate professor at the University of North Carolina, Chapel Hill at the School of Information and Library Science, a contributing opinion writer at the New York Times, and a faculty associate at the Berkman Klein Center for Internet and Society at Harvard University. Her first book, Twitter and Tear Gas: The Power and Fragility of Networked Protest provided a firsthand account of modern protest fueled by social movements on the internet.
She writes regularly for the The New York Times and The New Yorker