Missing Datasets

These days, I’m a fellow over at the Data & Society Research Institute. I’m working on a couple of different projects, but the primary one has to do with missing datasets.

Calling something “missing” automatically implies that it should exist, and that’s sort of the point of my project. We’re living in a time of unprecedented levels of data collection. This isn’t a revolutionary insight, it’s just a rote fact. We are systematically tracked, recorded, and documented in ways that are more thorough and expansive than ever before. Though people have different relationships and attitudes to the tactics, methods, vehicles of this data collection (attitudes that range from hopefulyness around perceived benefits to desperate techno-pessimism about potential abuses), no one is exempt.

But at the exact same time that this massive overcollection is unfolding, there are blank spots in the data ecosystem. That is, within contexts that seem to have nearly every possible metric quantified and recorded, there exist spaces that are curiously devoid of data.

Here are some pretty familiar examples that explain what I mean: Despite the fact that the workplace is heavily-studied by sociologists and companies have obvious incentives for collecting data on employees, before ProPublica’s 2013 initiative there was no data on unpaid internships. There was no set of data that anyone could point to that gave any idea of how many students were working unpaid internships, or how many companies were offering them. It was a missing dataset.

An even better-known (and much more political) example has to do with civilians and the police. It wasn’t until quite recently, thanks to initiatives like D. Brian Burghart’s Fatal Encounters website and The Guardian’s The Counted campaign, that we as a public started to have an idea of the number of civilians killed in interactions with legal enforcement agencies. Prior to their work, that was a missing dataset.

In the article The Collection and the Cloud, Amelie Abreu points out that "...the Internet Archive isn’t the Internet Archive, but an Internet Archive, very much built and collected from a certain standpoint and position of power". Abreu's point -- that there’s always a reason why certain things get saved and others don’t -- applies to data as well. There’s a reason why certain data becomes a dataset, and that reason is as much personally and institutionally motivated as it is technologically. There’s not much incentive for a company to collect data on why it isn’t paying employees, just like there isn’t much incentive for the police to talk about how many unarmed civilians are killed each year, or there isn’t much incentive for tech companies to release abysmal diversity statistics. It’s not that organizations are maliciously trying to hide information so much as there’s just no reason for them to go out of their ways to collect, let alone publish, that data.

But of course, there is reason for other people to have that data, and in a time where data is collected about nearly everything, it wouldn't be surprising for many to feel as though not having data means that something doesn’t exist. For every dataset where there’s an impetus for someone not to collect, there’s a group of people who would benefit from its presence. More data doesn’t always mean better answers, but in cases where data is used as the end-all tool of proof or a definitive measure for change, then it’s clear that lacking it can be a serious structural disadvantage.

And here’s where my project comes in. I’m interested in finding and helping those who are directly affected by the issues in question fill other missing datasets. Is there a way to both provide access to previously unattainable datasets and give those people who have a stake in information the ability to affect it?

That’s the high-level overview of some of the work I’ll be doing this year. I’m just at the beginning of the process, but if you’re interested in any of these questions or have relevant datasets of your own, please do reach out.

The Personal Data Conundrum

For a long time, if you had asked me what one thing I would change about most people’s relationships to their data, my answer would have been awareness.

What I meant by that was simple: I wanted people to know more about the roles data plays in their lives and in the world.

In my mind, that meant loads of things: understanding that digital footprints have memories and lifelines, and that access to them brings the ability to infer things about people and relationships. It included knowing that companies like Facebook can make money off your data even while you still technically own all your content; understanding that data isn’t truth; knowing that photos can and are photoshopped (even by institutions like governments); believing that the Department of Homeland Security has and does track people who haven’t done anything wrong (though now it has greater means and a longer reach to do so).

It also meant knowing that personal data is the valuable currency that many corporations deal in. It involved accepting the fact that the things you knowingly and unknowingly produce have the ability to reach dizzying levels of dissemination, and the potential to live longer lives than you.

Banksy's Dismaland included a gigantic chart of surveillance in the UK that was both impressive and overwhelming. 

Banksy's Dismaland included a gigantic chart of surveillance in the UK that was both impressive and overwhelming. 

That’s just a (small) sample of the sorts of things that I think everyone should know. My thought process was that if everyone knew and understand the data landscape — not just data literacy, but the whole picture— the potential for our data to be used against us would be significantly minimized.

Obviously, that was a rosy, idealized point of view. And lately I haven’t been feeling the same way.

Lately I’ve been getting stuck on something I like to call the Personal Data Conundrum. Now that you know that you leave data trails everywhere, what are you to do with that information? So you know that companies make money off of you. Now what? Your data is valuable in conjunction with others’; for most of us, it carries less value when considered individually. So opting out as one person doesn’t necessarily change the system itself. And a company like AT&T may be handing over data to the US government, but it’s not like switching to T-Mobile solves the problem. Either way, your data is still out of your hands (and now your cell phone service is even spottier).

Lots of activists, artists, and privacy enthusiasts have one answer: that you should protect yourself. Use Tor! Download Telegram! Get a VPN! Use masked email addresses! But those are all individual responses to a systemic issue, and in situations with an unequally distributed playing field, it’s inevitable that only those with the time, resources, and interest will adopt those measures. And even if that wasn’t the case, the larger problems remain. Should you quit using a cell phone? Stop accessing public wifi? Refuse to use Google Maps, because even though it gives you directions it also combines your location data with the other hundreds of datapoints Google has on you? Our very ways of communicating are so ingrained in these compromised systems. For a lot of people, opting out of being a data machine means opting out of feeling like you’re fully participating in society.

So now my sense is that though awareness is important, demanding (or even hoping for) everyone to be completely knowledgeable about and connected to their personal data is shortsighted and overly-optimistic. Ignorance and indifference are completely reasonable responses to the realization of your lack of agency. Some people may think that if there’s nothing they can do about a situation, they’d rather not engage at all—-and I can’t even blame them for that line of thought.

So there’s the dilemma. I think it’s important for people to understand and think about their data, but once they do that, they end up directly confronting problems in efficacy that I don’t have practical answers for solving.

How To Get Your Mobile Data

A quick announcment before I jump in: I just launched Pathways, the output of my 2014-15 Fulbright-National Geographic Digital Storytelling Fellowship. It’s a site that shows the stories derived from collecting a month’s worth of mobile data from Londoners. I did the design, UX, and development for the whole thing, as well as the research, data collection and data analysis, so it’s quite exciting to have it finally live. Check it out here.

In conjunction with Pathways, below is a guide on how you can get the same locative, social media, and metadata that I collected from my participants. Quick note: this isn’t a guide for developers, programmers, or people who identify as very technical. If you fit in that crowd, then you’ll immediately realize that there are many more efficient ways of doing these things. Because the goal of Pathways is to be relevant to people who don’t have any passion or interest in data, this guide is meant to provide easy, not-very-technical hacks for getting your own data.


The main work I did with Pathways was in collecting location data. To be honest, there aren’t great options right now. I asked my participants to install OpenPaths, an open source mobile app that allows you to securely access your location information. OpenPaths is the best option you have in terms of security, but it’s not the most accurate and isn’t being actively developed anymore, which makes me reticent to really recommend it.

On the other hand, I’m not any more excited about the other options. Moves is the best option in terms of level of sophistication around tracking, but I can’t mention Moves without needing to say in the same breath that it’s owned by one of our favorite not-historically-great-with-privacy corporations, Facebook.

So you’ve got to choose what’s more important to you—security or effectiveness? An ugly choice, I know.


You can easily send yourself WhatsApp data from within the app itself. Click over on the chats tab, and then click the following: WhatsApp —> Menu button —> Settings —> Chat Settings —> Backup conversations —> email conversation without media


Open up Skype (on your computer). Choose a conversation. Right click, and you should see the option to “jump back”. Jump back all the way to the beginning, then hit Command + A (or Ctrl + A on a PC) to highlight all of your messages, then Command + C (Ctrl + C) to copy all the messages. Open up a text file and paste.


Really similar to WhatsApp. First go to Messages, then open Viber. Click the following sequence: [messages] —> open Viber —> more option … —> Settings —> calls and messages —> email message history


Google provides a service called Google Takeout where you can easily export your data. On the “Download Your Data” page, you’ll want to choose only “Hangouts”, otherwise it’s going to take hours to get all of your different types of Google Data. I suggest choosing a .zip format as it’ll be easy for you to extract. Google will alert you once your files are ready to download.

Once you get them, you can use this lovely great and free resource provided by Jay, a system administrator who is making your life easier. Just follow the instructions on this site: hangoutparser.jay2k1.com


Facebook is notorious for eating all of your data and then not providing that data in a really easy format. Getting your data, though, is easy enough:

  1. Go to top right of any Facebook page and select Settings
  2. Click "Download a copy of your Facebook data" below your General Account Settings
  3. Click Start My Archive

Other Options

Those are all pretty simple ways of doing things. They’ll give you access to some of your messages, and you can save everything into a text file. A slightly more technical option if you have an iPhone is to download a program like iPhone Backup Extractor or iBackupViewer; those will give you access to the actual databases that your messages for this apps are saved in locally on your phone.

If you have an Android phone, there are equivalent programs like Android File Transfer, which I believe that you can use even if you haven’t rooted your phone (if you don’t know what that means, don’t worry—it means you haven’t done it).

The Things You Don't Want To Know

A little while ago, one of my friends emailed me a link to Prism, an application that allows you to see a streamgraph visualization of your texting history over time.

In the email, my friend provided one small caveat: "My sources say it feels a bit creepy to see contacts appear and fade over time. Definitely a case of private data, methinks."

Image taken from Prism website. 

Image taken from Prism website. 

Let me just point out that I spend a lot of time talking about data literacy, privacy, data ownership, and what you can learn about yourself through data. Most of the work I do revolves around data collection and analysis, in some way, shape, or form(at).

In other words, you would think that I would be the target audience for something like Prism. But I couldn't bring myself to use it. Why? Because I really didn't want to see what it was going to show. I know the basics of my texting history. I know how it's changed as I've moved in and out of cities, countries, and relationships. After all, I lived through those experiences. And given that I know exactly how bittersweet some of them were, the last thing that I want to see is a cheery data viz reminding me of just which people have popped in and then slowly (or even worse, abruptly) faded out of my life. I already feel that particular shade of wistfulness when I stumble over similar information in other people's lives; how much worse will it be to see it in my own?

Maybe that's something that we should talk more about. Just because we have access to all sorts of data about the world and ourselves doesn't necessarily mean that we need to see all of it. To be clear, I'm all for data analysis, empowerment, journalism and the things that you can through all three. But surely we can acknowledge that not everything is suited to routine and saccharine representation through shapes, lines, and maps. Do you want to know how few of your friends will be alive for your 95th birthday? Do you want to know how many times you cried after your last breakup? And those are just the trivial examples!

Perhaps there are things in this world—-messy, difficult, things—-whose very nature demands that we consider them apart from the sense of order, categorization, and understanding that data visualizations tread in. Maybe some things mean less, not more, once categorized and put into metrics.

Or maybe I'm wrong. After all, I could just be squeamish. So you tell me: is there always something to be gained by relentless quantification, or are there things that gain more power by resisting it? I'd love to hear from others (and not just because I'm staring at a CSV file of my old iMessages, wondering whether or not to open it).

Pinterest meets CCTV Surveillance

I feel like one day I woke up and was on Pinterest. I didn't remember signing up for it, I barely knew what the site was for, but all of a sudden I was regularly seeing cheery emails in my inbox proclaiming "X person started following you on Pinterest" and "Happy Pinning!" Pro tip: ignoring them will not make those emails go away.

Today, though, all those Pinterest emails are no longer for naught. Since I moved to the UK I've been fascinated by the fact that the city has over 7000 CCTV surveillance cameras. I even wrote a post about it over at National Geographic.

But even more captivating to me than the cameras themselves are the signs alerting the public to their presence. Some of the signs are curt and straightforward in tone, others lighthearted, still more apologetically explanatory.

But I shouldn't just have to explain it to you--you should get the chance to see them for yourself. So I created a Pinterest board where I'll be uploading the photos that I take of CCTV signs. Up until this point I've been haphazard in my dedication to documenting them, but starting from now I'll be religious in taking a photo of every one of these signs that I see, and then uploading those pictures to the board.

(So, yes, this is a post letting you know about the pictures I'm taking of the signs that let us know about the pictures that the British state is taking of us. Got that straight?)

Check out the board here.

Online and Offline, Seen Differently

I want to use this post to say something that's more of an addendum to a previous post I wrote on the online/offline "divide" (or lack thereof) than a full thought.

Read More

We Don't Remember Where We Go Online

Last week, I conducted a browser experiment with students at the Royal College of Art. The idea was to continue some of the investigations I've been doing on conceptions of different types of spaces--except this time, instead of looking at how people think of physical space, I wanted to examine how they think about non-space (aka their browser histories).

Read More

The Real You, [Un]Filtered

On Monday 13 October, at 6am EST— likely by the time that most of you will be reading this— my first post for National Geographic will be live (and I'll forgive you if you leave now to go check it out). The post serves as an introduction to me and to the project I’m working on this year, but it’s more subtle purpose is to hook people by convincing them that what I’m doing is interesting and important (which it is, by the way, in case you still need convincing). 

Read More

Tips for the Fulbright-National Geographic Digital Storytelling Fellowship

There’s a long answer and a short answer to the question How do you become a Fulbright-National Geographic Digital Storytelling Fellow? The short answer is, plain and simply, that I don’t really know. 

But no one reads a post for the short answer. So here’s the long answer, but first, a disclaimer: I can’t tell anyone how to get the fellowship. In fact, I officially began my tenure as a FB-NatGeo fellow two weeks ago. To even write this post feels a bit like hubris, because I’m giving advice on something I haven’t fully (or even halfway) experienced. 

But like I said, I've gotten a lot of emails with questions. So I’m going to use this post to very cautiously offer a few nuggets of information that I think might be helpful. These opinions are solely my own—I don’t speak for the Fulbright Program, or the IIE, or the ECA, or National Geographic, or the Department of State (those are some of the entities that will be looking at your applications, by the way). I’ll leave the comments on this post open for the time being, so if you have any questions feel free to leave a comment or send me a tweet (@thistimeitsmimi).

Onwards (be prepared, this is quite long):

So what’s this fellowship all about? 

The fellowship is a chance for Americans to leave the States for 9 months and tell a compelling story using new media/technology. The only requirements are that you propose something interesting, engage with the themes (see the website to learn more about them), come up with a project takes places in up to three countries outside of the US, and that you have a college degree by the time you begin the fellowship. Obviously you’ll have to fill out an admittedly long and grueling application, but its still a great opportunity. 


What do I get from it?

It's a chance to work, fully-funded, on a project that you care about for a full 9 months, which is not too shabby of a deal. You also get a materials stipend, as well as guidance/mentorship from National Geographic. And of course, you are posting content on the National Geographic website that can take a variety of forms. 


So the project belongs to National Geographic?

No, not at all. It’s your own project, you retain all rights to it. Once the fellowship is over, you can do whatever you please with it. National Geographic does have the rights over anything that you post on their site, and they can distribute that and use it across any of their platforms. But it’s up to you to choose what it is you're posting, and what aspects and perspectives of your project you’re sharing. The particulars on that are something I’ll be ironing out over the year, so tune in here and at my soon-to-be-functioning project page to see what that ends up looking like. 

Nat Geo also gets the right of first refusal on anything that you write about the project, so you'll have to pitch your ideas to them before you pitch to any other media. Having said that, if they do refuse, you can distribute your content wherever you like. 


What’s the deal with the affiliations? 

As part of the application, you need to either submit a list of affiliations who have agreed to work with you or get an official letter of affiliation from an institution. Institution can be interpreted broadly—it can be a person, university, organization, non-profit, really anything or anyone that can provide you with some sort of support for the project. When I applied, I submitted a list of affiliations, and it was only after I was chosen that I had to provide an official letter of affiliation (for the curious, my affiliation is with the very cool IED department of the Royal College of Art).  


What should I put in my portfolio/Statement of Grant Purpose/Personal statement?

Okay, so this is how I thought of it when I was applying. The best project you can propose is one that is creative and explores a story while still being demonstrative of your particular skillset and interests. So when I was applying, I thought of the application as a chance to do these five things:

  1. Pitch a project

  2. Convey why the project you’re proposing is important

  3. Explain why all relevant parties should be interested in it 

  4. Show why YOU are the one who is best equipped to tell the story you want to tell

  5. Describe how you intend to carry it out effectively

Everything the application asks of you is really just supporting those five aims. The SoGP is about laying out the logistics and explaining why it’s important; the personal statement is about explaining your personal investment in the project and  showing why you’re the right person; the work sample is about proving that you can do the things that you’re proposing by showing things you’ve done in the past. 


No, but really, let's talk about that work sample. I don’t have a background in film/documentary/photography. Can I still apply? 

Here’s all I can say: this certainly isn’t a documentary or photography fellowship. Yes, one of the current fellows happens to be making a documentary for his project, but that doesn’t mean that you have to have made one before in order to do this. It also doesn’t mean that you shouldn’t propose a documentary if that’s your strength—it’s the application as a whole that matters, and if your project is best served by a documentary, then there you go.

But no, you don’t need to have been a photographer or filmmaker all your life. It certainly helps if you have some idea of and familiarity with digital tools and creative ways to tell your story, and if you don’t have any background in that, then you'll need to think a little bit more about the story and how you can tell it in a way that fits that nebulous phrase of "digital storytelling". But it’s not just about the technology; it’s about the whole package. Creativity and a compelling project are the most important things. And you do get some guidance from Nat Geo along the way on materials, tools, and how to tell your story, so there’s a bit of room for flexibility. 


What do you think of _______ project idea that I have? 

Hmm, I don’t know. Has it been done before? Can you do it? Does it fit into the themes? Is it interesting? If your idea answers those questions in the right ways, you’re probably off to a good start. I’m not one of the judges, so unfortunately I’m not well-equipped to tell anyone if their idea is good or bad. 


How come you can choose up to three countries but all of the finalists from this year are only going to one country?

As part of the Statement of Grant Purpose, you have to outline the logistics for your project. And as someone who was initially planning to go to three countries, I’ll point out that as difficult as it is to come up with a cogent project in one country and spin up the affiliations to support it, it’s three times as hard to do it for three countries. In addition, nine months is not a very long time, so if you’re going to be going to multiple places, you’ve really got to think about if you can realistically do all the things you want to do in the necessarily shorter amount of time you’ll have in each. 

Having said that, you should absolutely propose a project that takes place in more than one country if it makes sense for you. Don’t shy away from it if it’s what you want to do, it works well, and you can justify why you’re doing it. 


But Mimi, what’s YOUR project? And how can I learn more about it? 

Best question of the day! My project falls in the fields of digital ethnography, art, mapping, urban spaces and online/offline interactions. It’s all about where people in the city of London go, and how and whether those people get opportunities to interact with those who are different than them; I’m exploring these issues  in the context of the city and of the web.

More specifically, I’ll be gathering together a demographically diverse group of Londoners and using their mobile/computer browsing history and personal geolocation information to create visualizations/maps that show where each of these individuals travels online and offline. With this information, I'l be able to make inquiries into if the structural realities of urban offline spaces are replicated online. I'll be supplementing this with qualitative insights from my participants, and hopefully I'll be doing a few mini-projects along the way that tease out other interesting threads that emerge from the project.

If you want to be kept in the loop (and you should, because it's a fun loop), there are three things you can do: 

  1. Follow me on Twitter. Very easy way to stay connected to all the things I'll be doing. 

  2. Return happily and frequently to this very site, where I'll be writing longform entries on the same topics the project covers. 

  3. Visit my soon-to-be-functioning project tumblr. It's not much now, but soon it'll be the place where I throw up all types of research, findings, and progress on the project. 

In the future, the fourth very exciting thing you’ll be able to do is check out the page on National Geographic where all of us will be posting about our projects. Unfortunately, that space is not currently set up yet, but once it is I’ll both update this post and share the link on Twitter. 


Any final tips?

I'd tentatively suggest choosing countries that weren't covered this year, but not if doing so would mean changing a crazy good project that you've already got planned. And think of your audience--your two big sells are Fulbright and Nat-Geo, so keep that in mind. 

Okay, that’s all I’ve got. I’ll leave the comments open on this post, so like I said, leave a comment or shoot me a tweet if you’ve got questions. 

Good luck to you all!

The Mainstream Treatment of Digital Art

As of about a week and a half ago, I unceremoniously picked up my life in New York and dumped it into London. I had initially planned to move to London a solid two weeks later, but there were some definite perks achieved by coming in earlier. One of those perks is that I managed to make it to London just in time for the closing weekend of the Barbican Centre’s Digital Revolution Show.  The show, as has been well-documented, was meant to be a “celebration of art made with code.” I attended on literally the final day, at the last time slot that was available (in my defense, I’m Nigerian; we don’t believe in showing up to anything early)

Read More