Peer review of journal articles: how good is it really? A librarian evaluates an evaluation system, for scholarly information sources.

Peer review is a signifier of quality in the scholarly world: it’s what librarians (like me) teach students to look out for, when evaluating information sources. In this blog post, I explore some of the uses, criticisms and new developments in the arena of scholarly peer reviewing and filtering for quality. My evaluation of this evaluation system is fairly informal, but I’ve provided lots of useful links.

What is peer review?

It varies from one process to the next, but ideally, scholarly journal articles are chosen and polished for publication by a number of other scholars or peers in a process known as peer review, or sometimes called refereeing. Sometimes only two reviewers are used per article, sometimes three are used, plus of course the journal editor and editorial board have roles in shaping what sort of content is accepted in the journal.

Sometimes the process is “double-blind”, in that the reviewers don’t know who the author(s) are, nor the authors know who the reviewers are and sometimes it is only “blind” in that the author(s) don’t know who the reviewers are. In this way, the reviewers can be critical without fearing that they might suffer negative career consequences.

However, one problem with peer review worth noting here (although not explored below) is that peer reviewers criticisms can often be brutal because they are made under the protection of anonymity. I also think that the time pressures mean that peer reviewers don’t phrase their thoughts “nicely” because it simply takes too long and they don’t have such time to invest.

Double-blind reviewing is not always possible: it can be difficult to disguise authors’ identity since the research described in the paper might be known to peers, for example when only one or two labs have the specialist equipment used.

There’s more information on peer review over on the PhD Life blog, which explains what reviewers might be looking for and the possible outcomes of peer review. It also explains some of the other quality-related processes associated with scholarly journal publishing, such as corrections and retractions.

Peer review happens in other contexts too, such as the UK’s REF which has been heavily criticised as not the “gold standard” that it should be, because reviews of outputs were carried out by only British scholars, and that a paper might be read by only one reviewer in this process.

Another frequent peer review process is when research funding bids are reviewed and grants are awarded: panels are often made up of peers. I’ve done this and it’s a valuable experience that helps you to hit the right note in your own future funding applications, but it is also hard work, to read all the bids and try to do them all justice.

It sounds good, so why ask how good it is?

Journal publishing is always growing, and peer review is under pressure. A recent scam involving peer reviewing your own papers and its discovery is described by the Ottowa Citizen. Every year I read about papers that have been published in spite of journals’ quality filters. The Retraction watch website highlights stories of published scholarly articles that journals have retracted, i.e. the research findings described are not reliable.

Here are some of the flaws of the peer review process, in relation to journal articles.

1) It takes a very long time

I sense frustration about long journal turnaround times and peer review takes up quite a lot of that turnaround time. When you think about how much pressure there is on academics to write and to publish, how they get little recognition and no financial compensation for participating in the peer review process, how it is important to be seen to be the first to publish on something, and how scholarly work can be sooner built upon when it is published more quickly, it is no surprise to me that review times are not so fast.

2) It’s not efficient

If you submit to one journal and are peer reviewed and then rejected, you can then submit to another journal which might also put your article forward for peer review. Some people might call this redundant reviewing (since the work has already been done!) and it does add to the time-lag before research can be published and shared. As a response, there have been attempts to share reviewed papers, such as when your paper is rejected from one journal but it is suggested that you submit to another journal title by the same publisher instead.

3) Peers themselves get no credit or compensation for their work

There is a service called Rubriq that tries to address this criticism, and all of my points above. They offer a service to authors of having their papers independently reviewed, for a fee. They track the reviewers work in a way that allows them to demonstrate their contribution to the field through reviewing, and they also pay a fee to the reviewers, although this can also be waived by reviewers who can’t earn this way, and it is not thought to be the full value of the input supplied by reviewers.

Authors often suggest appropriate reviewers anyway, so if they supply an already reviewed paper to a journal, perhaps the editor might accept the process from this independent company. Rubriq have a network of journals that they work with.

4) Some articles don’t even reach peer review

A recent piece in Nature News summarises findings of research indicating that whilst journals are good at filtering out poor quality articles through peer review, the journals themselves were not so good at identifying the long-term highest cited papers. 12 out of the 15 most cited papers involved in the study were rejected at first, before finally making it to publication. Perhaps this is because, after rejection by peer review, articles were improved and re-submitted, so the system is working, although I think that the peer reviewers in such instances deserve credit for their contribution. However, this is to assume that the higher cited articles are in fact higher quality, which is not necessarily the case. (See below for a brief consideration of citations and bibliometrics.)

Rejection after peer review is one scenario. The other is also often called “desk rejection”, where an editor chooses which articles are rejected straight away, and which are sent to peer review. Editors might be basing their decisions on criteria like relevance to the journal’s readership, or compliance to the journal’s guidelines and not always on the quality of the research.

The message that I take from this is that authors whose papers are rejected can take heart, and keep improving their paper, and keep trying to get accepted for publication, but in trying to please editors and peer reviewers, we are potentially reinforcing biases.

5) Negative results are not published and not shared

This is another case of biases being perpetuated. There are concerns about the loss to scientific knowledge of negative findings, when a hypothesis was tested but not found to be proven. Such findings rarely make it into publication, because what journal editors and peer reviewers seek to publish is research which makes a high impact on scientific knowledge. And yet, if negative results are not reported then there is a risk that other researchers will explore in the same way and thus waste resources. Also, if research is replicated but not proven, this is potentially valuable to science because it could be that the already published work needs correcting. But the odds are stacked in favour of the original publication (it was already peer reviewed and accepted, after all), such that the replication might not be published. Science needs to be able to accommodate corrections, as the article I’ve linked to explains, and one response has been the emergence of journals of negative results.

What are the alternatives to traditional peer review?

I don’t suppose that my list is comprehensive, but it highlights things that I’ve come across recently and frequently, in this context.

John Iaonnides has written that most published research findings are false, and one answer could be replication. A measure based on replication could be useful to indicate the quality of research. But who wants to reproduce others’ research when all the glory (citations, research funding, stable employment) is in making new discoveries? And it’s not simple to replicate others’ studies: we’re often talking about years of work and investigation, using expensive and sophisticated machinery and quite often there will be different variables involved so for some research, it can never be quite an exact replication.

Post-publication peer review is another possible way to mark research out as high quality. I really like what F1000 are doing, and they explain more about the different ways that articles can be peer reviewed after having been published. I’m not sure that I want to rely on anonymous comments fields, although of course they can bring concerns to light and this is only one kind of “peer review”. I use quotation marks, because if the comments are anonymous, how do you know that they are from peers? But if the peer reviewers and their work are attributed, then I find this to be a really interesting way forward, because one of the pressures on peer review is the lack of acknowledgement, and the removal of anonymity is one way to do this.

I like the concept of articles being recommended into the F1000Prime collection: this is almost like creating a library, except that it’s not a librarian who is a filter but a scholarly community. In fact, many librarians’ selections come from suggestions by scholars anyway, so this is part way to a digital library. (Although I believe quite firmly that it is not a library, not least because access to the recommendations is restricted to paying members.) Anyway, a recommendation from a trusted source is another way to filter for quality. The issue then becomes, which sources do you trust? I blogged recently about recommendation systems that are used in more commercial settings.

I have to mention metrics! I’ll start with bibliometrics, which is usually measuring or scoring that relates to citations between journal articles or papers. For many, this is a controversial measure because there are many reasons why a paper might be cited, and not all of those reasons mean that the paper itself is of high quality. And indeed, there are many high quality papers which might not be highly cited, because their time has not yet come or because their contribution is to a field in which article publication and citation are not such common practice. The enormous growth in scholarly publication has meant that citation indices might also be criticised for too narrow a coverage,

In general, in the lead up to REF2014, researchers in the UK were keen not to be measured by bibliometrics, preferring to trust in peer review panels as a better way to evaluate their research. Yet citation indices allow you to order your search results by “most highly cited”. Would they do this if there was no interest in it as a measure of quality? Carol Tenopir has done some really interesting work in this area.

If you think that bibliometrics are controversial then altmetrics have provided some of the juiciest criticisms of all, being described as attention metrics. Yes, altmetrics as a “score” can be easily gamed. No, I don’t think that we should take the number of Facebook “likes” (or worse, a score based upon those and/or other such measures which is calculated in a mysterious way) to be an indicator of the quality of someone’s research. But, I think that reactions and responses to a published research article, as tracked by altmetric tools, can be enormously useful to the authors themselves. I’ve written about this already. Altmetrics require appropriate human interpretation: pay the scores too much attention and you will miss the real treasures that other people have also missed.

So how good is peer review, really?

It is a gold standard. It is what publishers do when time and resources allow. But it is not perfect and it is under pressure, and I’m really intrigued and impressed by all the innovative ways to ensure and indicate quality that are being explored. Of all the alternatives that I’ve discussed here, I’m most keen on the notion of open peer review, where it is not anonymous but accredited. This might be post publication or pre publication, but I’m keen that we should be able to follow peer reviewers’ and editors’ work.

A lot of these changes to scholarly publishing in the digital era seem to me to mean that the librarian’s role as a filter of information is pretty much at an end. But our role as a guide to sources and instructor of information literacy is ever more important. I would still teach budding researchers to consider peer reviewed works to be more likely to be high quality, but I would also say that they should apply their subject knowledge when reading the paper, and they should look out for other signs of quality or lack thereof. Peer review (and how rigorous it is) is one of a number of clues, and in that sense, nothing much has changed for librarians teaching information literacy, but we do have some interesting new clues to tell our students to watch out for.

How do you assess the quality of recommendations?

I wrote here last year about the marvellous Fishscale of academicness, as a great way to teach students information literacy skills by starting with how evaluate what they’ve found.  I’m currently teaching information ethics to Masters students at Humboldt Uni, and this week’s theme is “Trust”: it touches on all sorts of interesting topics in this area, including recommendation systems, also known as recommendation engines.

An example of such a recommendation system in action would be the customer star ratings for products on Amazon, which are averaged out and may be used as a way to suggest further purchases to customers, amongst other information. Or reviews for hotels/cafes on Tripadvisor, film suggestions on Netflix, etc. Recommendations are everywhere these days: Facebook recommends apps you might like, and will suggest “people you may know” : LinkedIn and Twitter work in similar ways.

For me, these recommendations beg certain questions, which also turn up in debates about privacy and about altmetrics, such as:

How much information do you have to give them about yourself, do you trust them with it, and how good are their recommendations anyway? Are you happy to be influenced by what others have done/said online?

Recommendation systems use “relevance” algorithms, which are similar to those used when you perform a search. They might combine a number of factors, including:

  • Items you’ve already interacted with (i.e. suggesting similar items, called an item-to-item approach)
  • User-to-user: it finds people who are similar to you, eg they have displayed similar choices to you already, and suggests things based on their choices
  • Popularity of items (eg Facebook recommends apps to you depending on how much use they’ve had) Note that this may have to be balanced against novelty: new items will necessarily not have achieved high popularity.
  • Ratings from other users/customers (here, they might weight certain users’ scores more heavily, or average star ratings, or just preference items with a review)
  • Information that they already have about you, against a profile of what such a person might like (eg information gleaned from tracking you online through your browser or on your user profile on their site, or that you have given them in some way)

The sophistication of the algorithm used and the size of the data pool drawn on (or lack thereof) might also depend on the need for speed of the system.

Naturally, those working on recommendation engines have given quite a bit of consideration to how they might evaluate the recommendations given, as this paper from Microsoft discusses, in a relatively accessible way. It introduces many relevant concepts, such as the notion that recommending things that it knows you’ve already seen will increase your trust in the recommendations, although it is very difficult to measure trust in a test situation.

We see that human evaluation of these recommendation systems is important as “click through rate (CTR)” is so easily manipulated and inadequate as a measure of the usefulness of recommendations, as described and illustrated in this blog post by Edwin Chen.

Which recommendations do you value, and why? I also came across a review of movie recommendation sites from 2009, which explains why certain sites were preferred, which gives plenty of food for thought. From my reading and experience, I’d start my list of the kind of things that I’d like from recommendation systems with:

  • It doesn’t take information about me without asking me first (lots of sites now have to tell you about cookies, as the Cookie collective explain)
  • It uses a minimal amount of information that I’ve given it (and doesn’t link with other sites/services I’ve used, to either pull in or push out data about me, unless I tell it that it can!)
  • Suggestions are relevant to my original interest, but with the odd curveball thrown in, to support a more serendipitous discovery and to help me break out of the “filter bubble
  • Suggestions feature a review that was written by a person (in a language that I speak), so more than just a star rating
  • Suggestions are linked in a way that allows me to surf and explore further, eg filtering for items that match one particular characteristic that I like from the recommendation
  • I don’t want the suggestions to be too creepily accurate: I like to think I’ve made a discovery for myself, and I doubt the trustworthiness of a company that knows too much about me!

I’m sure there’s more, but I’m equally sure that we all want something slightly different from recommendation systems! My correspondence with Alke Groeppel-Wegener suggests that her students are very keen on relevance and not so interested in serendipity. For me, if that relevance comes at the expense of my privacy, so that I have to give the system lots of information about myself, then I definitely don’t want it. What about you?

What use is social media to a researcher? Find out at a Google Hangout event

I’m very pleased to be taking part as a panellist in an online Q&A session called “How to be a successful digital academic to boost your career.” It takes place on 27th Jan at 12 noon, GMT and is hosted by none other than the Thesis Whisperer, Dr Inger Mewburn!

We’ll be exploring the theme of social media and its usefulness to academics. Do you think social media is useful, or do you wonder how you could possibly make use of it, as a researcher? I’m sure that the expert panel will have some ideas of interest to you! Themes of online engagement through blogs, as well as writing for online audiences are bound to emerge, in addition to digital networking.

I was invited in my capacity as editor of the Piirus blog, and I’m sure I’ll explain a little bit about how Piirus differs from other online tools. It’s more of an online dating or introductions agent, and its extremely light touch. Its purpose is to help researchers make connections beyond their disciplines and beyond national borders. It also comes from the academic community itself, and is based at the University of Warwick alongside, the hosts of the Google hangout event.

If you’ve never attended such an event online before, well they are something like a webinar, and something like a live conference panel session. You get to type in questions to the host, who will pass them on to the panellists. You can even send in questions in advance. During the event, you can sign in and then see and hear the panellists discussing the questions. If you can’t attend the event live, well no worries: it will be recorded so that you can watch it later.

There is a lot more information about it, over on the event page on Google+. I hope you find it valuable!