Whose Data? UMD librarians weigh in on publication and data surveillance in academia

Article published in Research Information discusses the implications of publications-as-data model

May 5, 2023

Headshots of Jordan Sly and Joseph Koivisto

In a new article for Research Information, “Publication and Data Surveillance in Academia,” UMD librarians Jordan Sly and Joseph Koivisto discuss the complex issues at play in an increasingly data-driven world.

We live in a time of bountiful knowledge. The 21st century brought not only the expansion of internet access and usage but a growth of scholars in research institutions. We are surrounded by an almost unquantifiable amount of information and findings, and there is more published every day.

While we’re lucky to live amid such abundance, the sheer amount of available data means we need help to process it all and find what we’re looking for. Librarians around the world work to make sure this work is organized, classified, and, most importantly, discoverable. But they can’t do it alone, which is why they rely on databases and third-party programs to help sort, classify, and share this data.

These databases and platforms allow users to easily find articles using keywords and search terms. In addition, they also use metrics like findability, impact scores, and H-factors, which indicate, for example, how frequently an article has been cited. However, there’s a cause for concern about this overreliance on metrics. When we lean on these quantitative measures—as opposed to scholarly merit—to pick which articles to read or research to cite, is something important lost?

In addition, the for-profit databases and platforms we rely on to help us find information are becoming fewer and fewer as they consolidate and monopolize the industry. Where there used to be dozens of programs libraries could look to, now there are only a handful, which means libraries have less power when working with an enterprise platform given how little competition there is for their business.

On top of that, these companies have begun to monetize their internal usage data. For example, a company can track how many UMD students from the engineering department are searching for x terms, which could be valuable information to the university. They then charge the university money to access this data about their own users.

This confluence of issues—overreliance on metrics, lack of competition for for-profit, third-party vendors, and data profiteering—all come together in a recent article published by Joseph Koivisto, UMD Systems Librarian, and Jordan Sly, Head of Humanities and Social Science Librarians. Their paper, “Publication and Data Surveillance in Academia,” published in Research Information, begins to question these relationships, offering an important perspective on an issue affecting libraries across the country, and beyond.

We sat down with Koivisto and Sly to ask them more about their article. You can read the full paper here.

What is your paper about?

Jordan Sly: [In short,] the over-quantification of the library turning everything into numbers, the repackaging of these numbers to prove an impact for certain disciplines in ways that privilege certain departments over others, and the ways in which commercial enterprise vendors—the people who are able to collect this data based on usage—resell it to universities. Vendors are recycling, reusing, and effectively reselling our own data to ourselves, and selling it to us at high prices.

To put it in a social media context: What Facebook, Twitter, etc. figured out after their products had launched was selling user data to advertisers for targeted advertisement. The business model was no longer to connect people; it was to sell people to [other entities]. The same thing is happening with these publishers—they’re realizing the real money comes from what can be necessitated between two parties.

So if universities need to buy high-impact journals in order to facilitate research, then not only do [vendors] get that information, but [they] can sell that information—who’s using what, when, how [back to the university]. What becomes valuable isn’t the journal, but the access to that journal, the usage stats, the user profiles. Now that [a publisher] knows that x amount of people from the University of Maryland want to access a particular journal, that usage becomes more valuable than the journal itself. That’s something they can leverage and sell.

It isn’t nefarious at its front. The idea of a company making money is just the way it works; that's not the problem. The problem is the knock-on effects. And this is where we become concerned.

What are some knock-on effects?

Joseph Koivisto: As we rely on quantitative measures to guide our understanding of impact, it starts to inform itself and inform the pattern of scholarship. If a journal has a large impact factor and an article is cited a lot, then it starts to drive interest in it, so researchers start to cite something not because of the value that they found in the scholarship, but because it has a high number. That then informs their scholarship, because if they are reading materials that have received high-citation counts and are in high-impact factor journals, they want to emulate that. And so they create this self-feeding cycle in which they are replicating research not based on its scholarly merit, but based on its datafied value, in which the value of the scholarship is reduced down to a single factor, which is a numeric value.

Where do academic libraries fit into this?

Kovisto: [As librarians,] we are pushed to buy materials that have high-impact factors, because those are the journals that scholars get exposed to; they are viewed to be the most important. So we are driven to buy these subscriptions for these high-impact-factor journals. But we have limited resources. So if every voice is saying “buy these high-impact journals,” that means we are winnowing down the spectrum of scholarship that we can present in our collections.

Sly: [Libraries] are effectively causing problems to our own existence by playing into that cycle by working within the hype machine of these journals and facilitating researchers with the tools to ignore so much of this other research. We use the term ouroboros—a sort of self-eating snake. That’s our central image for a lot of this.

Could you talk more about the impact this could be having on scholarship?

Koivisto: On a very basic level, we’re losing critical engagement with materials. By looking at data points as the measure of scholarly importance, we’re not engaging with the materials in a way that is informed by a human, critical reader.

The second thing is that these quantitative datafied measures are poor representations of the actual value of scholarship. Mike Dougherty at the University of Maryland, along with a couple other scholars, did an assessment looking at these metrics against the value of scholarship. One of the things that they found is that using impact factor and citation counts actually creates detrimental habits within scholarship, because it reinforces existing stereotypes on the value of scholarship and negatively impacts scholars who are women, people of color, or those coming from undersized or underfunded institutions. The systems recapitulate existing inequalities; it reduces the variety of perspectives in the scholarly record.

And, like I said earlier, it pushes us within libraries to buy the materials that meet these data perspectives. We can only buy so many things. So if the scholarship that we have to buy—because of the university; because of demand from our staff, faculty, and students—is only the materials that meet these datafied criteria for high-quality scholarship, that means we have to cut things. And when we look at things that we have to cut, it’s smaller journals, journals that aren’t necessarily published by the big publishers with high-impact factors. This decreases the diversity of scholarship and ultimately does a disservice to the scholarly community.

How are different folks in higher ed responding?

Koivisto: Schools and colleges are reevaluating their promotion and tenure practices—moving away from overreliance on these datafied scholarly measures and actually evaluating the work that somebody performs and publishes. If we disregard the impact factors—the H indexes or what have you—we have to actually engage with the scholarship. I mentioned Mike Dougherty. He pushed to have [the UMD Psychology Department’s] promotion tenure practices changed in their policy so as to not rely on those datafied metrics. That is encouraging to see.

Another thing that can be done in terms of making an impact is throwing our institutional weight behind openness and facilitating approaches to scholarship that don’t have to go through these big publishers. At the University of Maryland, the PACT working group developed an equitable access policy in which we are putting our institutional weight behind open access. We are trying to make it easier for scholars to publish their journal articles open access. Even if it isn’t [with] a big-impact journal, we still want to have open access available. It makes our scholarship accessible to people who can’t afford to buy expensive journal subscriptions and allows these journals to exist outside of the ecosystem of high-impact journals.

Lastly, you can’t say enough about consciousness raising. [The issue] is so big and so invisible, that it’s very difficult for everybody to grapple with.

Sly: [It’s difficult, because] it’s easier to use the numbers; it’s easier to buy a system that does all this for you and gives you the answer. The problem is when we don’t question how that answer is being derived. We’re creating and reinforcing a self-repeating cycle that ends up privileging certain areas of the university and certain types of research, and pushing the university in directions that the university may not be aware it is heading.

This interview has been edited and condensed for clarity.

Previous Article: Reading out loud and proud

Next Article: UMD Libraries Finals Success Guide 2024