essay

Ecologies of trust and the problem of ethical data

Trust
Photo by Neal Sanche from Flickr.

This past weekend, I saw several thought-provoking presentations at the annual meeting of the Special Interest Group on Design of Communication (SIGDOC). Among talks on how to construct better documentation, approaches to studying expertise and authenticity, and two keynotes covering lessons learned from a lifetime of studying users and what cybersecurity looks like to people outside it, a central theme emerged in my own discussions around the conference in talking with colleagues. One that has, even after coming back home and starting back into work, stayed with me. In a word: trust.

The theme arose in part from a discussion about two different app projects a colleague had seen that both relied heavily on user-reporting. For both, the central feature was that users could report on what they had seen or felt and then some backend would compile all the collective information and show other users the results. Importantly, one dealt with health information and the other with local weather conditions. However, in my own considerations of the projects, the issue was not that users could report on things around them, but that, to work accurately as intended, users needed to be trusted. This became the central sticking point for how to think through the ethical ramifications of the data itself, and something that stayed with me across related conversations.

In my mind, the two interconnected issues became these:

  1. Can you know if any of the data is false?
  2. If you can’t know, does it matter?

Now, obviously, the best-case scenario would be if all data reported by users was accurate all the time. When submitting health information to communicate possible risk, having the best information is ideal and, as obvious, wanted, if possible. However, and this became the first driving question for me, can you actually know that? It is one thing, as I discussed with others, to design for the communication of risk. There is excellent research done – and on-going! – on how to best communicate risk through things like maps and other visual metaphors. Yet, as I sought to invert the premise, I started to question the data itself. What if the data is lying? Or, put more clearly, what if the users who had added the data were lying, even if unintentionally? Can you know that?

There are ways to lower this risk, of course. Controls for voting information or users more trusted than others could be implemented. Key users could be vetted somehow and used as touchstones for if locative data could be trusted or measured against some metric. Data could even be weighed on different spatial levels, not trusting a single user but a group of a certain number per some certain measure of space. The risk can’t be eliminated, but it can be mitigated in part.

Yet, the second question drove me when it came to the more local weather condition app. What if, I proposed, a group of users conspired to compromise the data? Could you know? What if, as a far more likely possibility, the data from certain locations or as passed through a server became corrupted somehow? Should you trust it? What if, to make a wilder possibility, all of the data was false? If an entire database is ‘wrong,’ is it really wrong? What percentage of ‘wrong’ can be handled? Or, as I’ve pondered this last issue for the last day or so, does it actually matter?

When it comes to user-reporting data, what exactly is the draw? Users want to trust each other. If the data is wrong, the data is wrong. However, if users are wrong, this becomes a major issue. In a trust community, it is not the information, strictly, that is important. Instead, it is the community, the users. Continuing the thought experiment of users purposely reporting false data, then, it would not matter what was said, but if the relationship did. That is, the binding connections of the network is a measurement of trust between the nodes, the users, and not based solely on the content’s value itself.

In this context, the question of if data was ‘false,’ or if people should even be told if this was the case, rebounded with other questions of expertise and authenticity. What if, as is a common problem in technical communication, some certain of text contains detailed information that needs to be translated for another audience? Where does, in echoing the work of Katz (1992), efficiency and ethical behavior collide here? What is ethical data? And what data sources can be trusted and why? What is the ethos of data, and how does it arise? How do I know I can trust the data from other users, or do I simply trust everything and seek to verify it myself? And when it comes to data needed to plan against risk, where do I put my trust?