Fluidity is the hardest thing to read

A couple of weeks ago I participated in a charming event organized by Hard to Read, an L.A.-based lit series that recently held a satellite NYC event.


Affectionately titled Lovelace (cheeky reference to both Ada and Linda), the NYC convening was about "celebrating the past, present, and future of women and the Internet in conjunction with the release of Claire L. Evans new book Broad Band: The Untold Story Of The Women Who Made The Internet." Claire read from her new book, and throughout the night an eclectic group of other artists did readings that revolved around the general theme.

In lieu of a reading, I decided to tell a story about my time in Nigeria this past December. While in Imo State and Lagos, I spent a bit of time at local marketplaces. During one of those visits, I was struck by the realization that the marketplace, located in the center of town and packed all day with buyers and sellers, represented one of fairest social algorithms that I've yet encountered. The reason for this? In a word: fluidity.

Most of my work is about classification and categorization. As the world is increasingly mediated by software, it becomes more important for all of us to become data points. And to become data, you must be fixed, categorized, and rendered discrete.

One stall at an Imo State market, in all it's pepper-filled glory.  

One stall at an Imo State market, in all it's pepper-filled glory.  

But what I learned from the market is that the problem with categorization isn't really about categorization. In fact, it's not even necessarily about miscategorization. The problem with automated systems that categorize us and make decisions based on that is how they don't allow for us to be recategorized. I can't help but wonder: what would it look like to have systems that could allow for us to be not just seen, but seen differently, depending on our ever-shifting contexts?

Anyway, if you've got seven minutes, you can listen to the audio of my recording, which goes a bit more in depth on the idea. As a bonus: when you click the link you get to see a hella unflattering photo of me from the reading. Don't say I never give y'all anything.

Link here: https://soundcloud.com/hard-to-read/mimi-onuoha-on-algorithms

Missing Datasets

These days, I’m a fellow over at the Data & Society Research Institute. I’m working on a couple of different projects, but the primary one has to do with missing datasets.

Calling something “missing” automatically implies that it should exist, and that’s sort of the point of my project. We’re living in a time of unprecedented levels of data collection. This isn’t a revolutionary insight, it’s just a rote fact. We are systematically tracked, recorded, and documented in ways that are more thorough and expansive than ever before. Though people have different relationships and attitudes to the tactics, methods, vehicles of this data collection (attitudes that range from hopefulyness around perceived benefits to desperate techno-pessimism about potential abuses), no one is exempt.

But at the exact same time that this massive overcollection is unfolding, there are blank spots in the data ecosystem. That is, within contexts that seem to have nearly every possible metric quantified and recorded, there exist spaces that are curiously devoid of data.

Here are some pretty familiar examples that explain what I mean: Despite the fact that the workplace is heavily-studied by sociologists and companies have obvious incentives for collecting data on employees, before ProPublica’s 2013 initiative there was no data on unpaid internships. There was no set of data that anyone could point to that gave any idea of how many students were working unpaid internships, or how many companies were offering them. It was a missing dataset.

An even better-known (and much more political) example has to do with civilians and the police. It wasn’t until quite recently, thanks to initiatives like D. Brian Burghart’s Fatal Encounters website and The Guardian’s The Counted campaign, that we as a public started to have an idea of the number of civilians killed in interactions with legal enforcement agencies. Prior to their work, that was a missing dataset.

In the article The Collection and the Cloud, Amelie Abreu points out that "...the Internet Archive isn’t the Internet Archive, but an Internet Archive, very much built and collected from a certain standpoint and position of power". Abreu's point -- that there’s always a reason why certain things get saved and others don’t -- applies to data as well. There’s a reason why certain data becomes a dataset, and that reason is as much personally and institutionally motivated as it is technologically. There’s not much incentive for a company to collect data on why it isn’t paying employees, just like there isn’t much incentive for the police to talk about how many unarmed civilians are killed each year, or there isn’t much incentive for tech companies to release abysmal diversity statistics. It’s not that organizations are maliciously trying to hide information so much as there’s just no reason for them to go out of their ways to collect, let alone publish, that data.

But of course, there is reason for other people to have that data, and in a time where data is collected about nearly everything, it wouldn't be surprising for many to feel as though not having data means that something doesn’t exist. For every dataset where there’s an impetus for someone not to collect, there’s a group of people who would benefit from its presence. More data doesn’t always mean better answers, but in cases where data is used as the end-all tool of proof or a definitive measure for change, then it’s clear that lacking it can be a serious structural disadvantage.

And here’s where my project comes in. I’m interested in finding and helping those who are directly affected by the issues in question fill other missing datasets. Is there a way to both provide access to previously unattainable datasets and give those people who have a stake in information the ability to affect it?

That’s the high-level overview of some of the work I’ll be doing this year. I’m just at the beginning of the process, but if you’re interested in any of these questions or have relevant datasets of your own, please do reach out.

The Personal Data Conundrum

For a long time, if you had asked me what one thing I would change about most people’s relationships to their data, my answer would have been awareness.

What I meant by that was simple: I wanted people to know more about the roles data plays in their lives and in the world.

In my mind, that meant loads of things: understanding that digital footprints have memories and lifelines, and that access to them brings the ability to infer things about people and relationships. It included knowing that companies like Facebook can make money off your data even while you still technically own all your content; understanding that data isn’t truth; knowing that photos can and are photoshopped (even by institutions like governments); believing that the Department of Homeland Security has and does track people who haven’t done anything wrong (though now it has greater means and a longer reach to do so).

It also meant knowing that personal data is the valuable currency that many corporations deal in. It involved accepting the fact that the things you knowingly and unknowingly produce have the ability to reach dizzying levels of dissemination, and the potential to live longer lives than you.

Banksy's Dismaland included a gigantic chart of surveillance in the UK that was both impressive and overwhelming. 

Banksy's Dismaland included a gigantic chart of surveillance in the UK that was both impressive and overwhelming. 

That’s just a (small) sample of the sorts of things that I think everyone should know. My thought process was that if everyone knew and understand the data landscape — not just data literacy, but the whole picture— the potential for our data to be used against us would be significantly minimized.

Obviously, that was a rosy, idealized point of view. And lately I haven’t been feeling the same way.

Lately I’ve been getting stuck on something I like to call the Personal Data Conundrum. Now that you know that you leave data trails everywhere, what are you to do with that information? So you know that companies make money off of you. Now what? Your data is valuable in conjunction with others’; for most of us, it carries less value when considered individually. So opting out as one person doesn’t necessarily change the system itself. And a company like AT&T may be handing over data to the US government, but it’s not like switching to T-Mobile solves the problem. Either way, your data is still out of your hands (and now your cell phone service is even spottier).

Lots of activists, artists, and privacy enthusiasts have one answer: that you should protect yourself. Use Tor! Download Telegram! Get a VPN! Use masked email addresses! But those are all individual responses to a systemic issue, and in situations with an unequally distributed playing field, it’s inevitable that only those with the time, resources, and interest will adopt those measures. And even if that wasn’t the case, the larger problems remain. Should you quit using a cell phone? Stop accessing public wifi? Refuse to use Google Maps, because even though it gives you directions it also combines your location data with the other hundreds of datapoints Google has on you? Our very ways of communicating are so ingrained in these compromised systems. For a lot of people, opting out of being a data machine means opting out of feeling like you’re fully participating in society.

So now my sense is that though awareness is important, demanding (or even hoping for) everyone to be completely knowledgeable about and connected to their personal data is shortsighted and overly-optimistic. Ignorance and indifference are completely reasonable responses to the realization of your lack of agency. Some people may think that if there’s nothing they can do about a situation, they’d rather not engage at all—-and I can’t even blame them for that line of thought.

So there’s the dilemma. I think it’s important for people to understand and think about their data, but once they do that, they end up directly confronting problems in efficacy that I don’t have practical answers for solving.

How To Get Your Mobile Data

A quick announcment before I jump in: I just launched Pathways, the output of my 2014-15 Fulbright-National Geographic Digital Storytelling Fellowship. It’s a site that shows the stories derived from collecting a month’s worth of mobile data from Londoners. I did the design, UX, and development for the whole thing, as well as the research, data collection and data analysis, so it’s quite exciting to have it finally live. Check it out here.

In conjunction with Pathways, below is a guide on how you can get the same locative, social media, and metadata that I collected from my participants. Quick note: this isn’t a guide for developers, programmers, or people who identify as very technical. If you fit in that crowd, then you’ll immediately realize that there are many more efficient ways of doing these things. Because the goal of Pathways is to be relevant to people who don’t have any passion or interest in data, this guide is meant to provide easy, not-very-technical hacks for getting your own data.


The main work I did with Pathways was in collecting location data. To be honest, there aren’t great options right now. I asked my participants to install OpenPaths, an open source mobile app that allows you to securely access your location information. OpenPaths is the best option you have in terms of security, but it’s not the most accurate and isn’t being actively developed anymore, which makes me reticent to really recommend it.

On the other hand, I’m not any more excited about the other options. Moves is the best option in terms of level of sophistication around tracking, but I can’t mention Moves without needing to say in the same breath that it’s owned by one of our favorite not-historically-great-with-privacy corporations, Facebook.

So you’ve got to choose what’s more important to you—security or effectiveness? An ugly choice, I know.


You can easily send yourself WhatsApp data from within the app itself. Click over on the chats tab, and then click the following: WhatsApp —> Menu button —> Settings —> Chat Settings —> Backup conversations —> email conversation without media


Open up Skype (on your computer). Choose a conversation. Right click, and you should see the option to “jump back”. Jump back all the way to the beginning, then hit Command + A (or Ctrl + A on a PC) to highlight all of your messages, then Command + C (Ctrl + C) to copy all the messages. Open up a text file and paste.


Really similar to WhatsApp. First go to Messages, then open Viber. Click the following sequence: [messages] —> open Viber —> more option … —> Settings —> calls and messages —> email message history


Google provides a service called Google Takeout where you can easily export your data. On the “Download Your Data” page, you’ll want to choose only “Hangouts”, otherwise it’s going to take hours to get all of your different types of Google Data. I suggest choosing a .zip format as it’ll be easy for you to extract. Google will alert you once your files are ready to download.

Once you get them, you can use this lovely great and free resource provided by Jay, a system administrator who is making your life easier. Just follow the instructions on this site: hangoutparser.jay2k1.com


Facebook is notorious for eating all of your data and then not providing that data in a really easy format. Getting your data, though, is easy enough:

  1. Go to top right of any Facebook page and select Settings
  2. Click "Download a copy of your Facebook data" below your General Account Settings
  3. Click Start My Archive

Other Options

Those are all pretty simple ways of doing things. They’ll give you access to some of your messages, and you can save everything into a text file. A slightly more technical option if you have an iPhone is to download a program like iPhone Backup Extractor or iBackupViewer; those will give you access to the actual databases that your messages for this apps are saved in locally on your phone.

If you have an Android phone, there are equivalent programs like Android File Transfer, which I believe that you can use even if you haven’t rooted your phone (if you don’t know what that means, don’t worry—it means you haven’t done it).

The Things You Don't Want To Know

A little while ago, one of my friends emailed me a link to Prism, an application that allows you to see a streamgraph visualization of your texting history over time.

In the email, my friend provided one small caveat: "My sources say it feels a bit creepy to see contacts appear and fade over time. Definitely a case of private data, methinks."

Image taken from Prism website. 

Image taken from Prism website. 

Let me just point out that I spend a lot of time talking about data literacy, privacy, data ownership, and what you can learn about yourself through data. Most of the work I do revolves around data collection and analysis, in some way, shape, or form(at).

In other words, you would think that I would be the target audience for something like Prism. But I couldn't bring myself to use it. Why? Because I really didn't want to see what it was going to show. I know the basics of my texting history. I know how it's changed as I've moved in and out of cities, countries, and relationships. After all, I lived through those experiences. And given that I know exactly how bittersweet some of them were, the last thing that I want to see is a cheery data viz reminding me of just which people have popped in and then slowly (or even worse, abruptly) faded out of my life. I already feel that particular shade of wistfulness when I stumble over similar information in other people's lives; how much worse will it be to see it in my own?

Maybe that's something that we should talk more about. Just because we have access to all sorts of data about the world and ourselves doesn't necessarily mean that we need to see all of it. To be clear, I'm all for data analysis, empowerment, journalism and the things that you can through all three. But surely we can acknowledge that not everything is suited to routine and saccharine representation through shapes, lines, and maps. Do you want to know how few of your friends will be alive for your 95th birthday? Do you want to know how many times you cried after your last breakup? And those are just the trivial examples!

Perhaps there are things in this world—-messy, difficult, things—-whose very nature demands that we consider them apart from the sense of order, categorization, and understanding that data visualizations tread in. Maybe some things mean less, not more, once categorized and put into metrics.

Or maybe I'm wrong. After all, I could just be squeamish. So you tell me: is there always something to be gained by relentless quantification, or are there things that gain more power by resisting it? I'd love to hear from others (and not just because I'm staring at a CSV file of my old iMessages, wondering whether or not to open it).

Pinterest meets CCTV Surveillance

I feel like one day I woke up and was on Pinterest. I didn't remember signing up for it, I barely knew what the site was for, but all of a sudden I was regularly seeing cheery emails in my inbox proclaiming "X person started following you on Pinterest" and "Happy Pinning!" Pro tip: ignoring them will not make those emails go away.

Today, though, all those Pinterest emails are no longer for naught. Since I moved to the UK I've been fascinated by the fact that the city has over 7000 CCTV surveillance cameras. I even wrote a post about it over at National Geographic.

But even more captivating to me than the cameras themselves are the signs alerting the public to their presence. Some of the signs are curt and straightforward in tone, others lighthearted, still more apologetically explanatory.

But I shouldn't just have to explain it to you--you should get the chance to see them for yourself. So I created a Pinterest board where I'll be uploading the photos that I take of CCTV signs. Up until this point I've been haphazard in my dedication to documenting them, but starting from now I'll be religious in taking a photo of every one of these signs that I see, and then uploading those pictures to the board.

(So, yes, this is a post letting you know about the pictures I'm taking of the signs that let us know about the pictures that the British state is taking of us. Got that straight?)

Check out the board here.

We Don't Remember Where We Go Online

We Don't Remember Where We Go Online

Last week, I conducted a browser experiment with students at the Royal College of Art. The idea was to continue some of the investigations I've been doing on conceptions of different types of spaces--except this time, instead of looking at how people think of physical space, I wanted to examine how they think about non-space (aka their browser histories).

Read More