Research Trends article “Predicting citation counts” investigates what we have learnt in the last decade

@researchtrendy, or Research Trends, have published an article discussing some significant papers in this field. My own take on what it presents, boiling it all down to one piece of advice is:

The most important thing is who you are publishing with: they might bring you citation ‘pull’, they can raise your quality of your article, and they might have a good idea of the right venues to approach.

My favourite quotations & notes on significant parts are presented here, in a summary that I believe is accessible to practitioner librarians.

“citation counts will remain first among equals because of their intimate connection with the text of the article” 

“features around the Author and Venue were the most predictive. If we set the power of the Author features to 1.0, the relative power of the Venue and Content features would be about .63 and .25, respectively.” Content features include things like original vs review article, industry funding, structured abstract, word count of abstract, etc: see the article for a full list.


“large studies funded by industry, and with industry-favoring results, were statistically significant predictors of higher rates of citation in the future.” Note that this was in the “medical therapeutic space”, where new treatments are likely to become more widely available & investigated after such studies are published.

cardiovascular and oncology articles were more likely to be cited than those on other topics such as anesthesiology, dermatology, endocrinology, gastroenterology, etc.” The topic is significant, i.e. one where there are lots of patients & lots of interest will be a topic with higher citation rates.

“articles which provided therapeutic information were more cited, as were those which provided original research as opposed to review articles.” The latter part surprised me, as the received wisdom has always been that review articles are more highly cited, but the only evidence for this that I’ve come across has been that review journals seem to have the highest impact factors. Not that I’ve done proper research, and I’m quite sure that there are disciplinary differences.

“also found that longer articles were cited fewer times, in a weak but statistically significant way.” Strange, as you would have thought that an article with more research in it would present more that is worthy of citation. I want to know if the longer articles were full of more research, or just long-winded! (No doubt I should go to the original article… one day.) Another article discussed “…found a weak effect that the more topics an article covered the higher the number of citations it received.” Logical!


“…looked at whether the author byline indicated group authorship. This was found to be the most significant prediction feature in their study!” Yes, I’ve often seen claims that co-authorship leads to higher citation. Makes sense, because all of those authors might self-cite and promote the article appropriately, but also their combined inputs will make the work richer and their collaboration will have polished the work to a higher standard, in theory.

“being a very highly cited author is predictive of future citation counts.” So co-authoring with a highly cited author is not a bad move: not only will you learn from an experienced co-author and gain reputation by association, but you’ll also benefit from his/her citation ‘pull’. Exactly the tactic you can play with Twitter & Klout, if you’re involved in marketing by social media…


“If we know the journal the article will be published in, we can make more confident predictions about its eventual citation count” This effect must be when dealing with large numbers because we all know the apocryphal tales of journals with high impact factors because of one or two star articles which are hauling in the citations. It reminds me of those scores that tell you that your chosen method of contraception is 97% effective: it doesn’t mean anything if you’re one of the individuals in the 3%! You take your chances, and journal impact factors do matter.

When looking for other measures of a journal, the “strongest are the number of databases that index the journal, and the proportion of articles from the journal which are abstracted within two months by Abstracting and Indexing services and synoptic journals.” I’ve long advocated to researchers that publishing in journals that are indexed in the sources where they search for research is a good idea. (NB I had to think about “synoptic”, and it means that it’s a journal publishing synopses, or summaries!)

The author discusses how there is room for more research into the topic of the “Venue”, including when it comes to altmetrics. I believe that experienced authors must have their own informal lists of journals to approach, ranked by their own perception of the quality of the journal, even when they do not have explicit lists: it is often very subjective, and difficult to measure, but I wonder if they can articulate how they assess the quality of a journal?

The studies discussed in the article have looked at a variety of interesting features. To this list, I might add: OA journal, hybrid OA, or no OA at all; rejection rate; time to rejection/acceptance; time to publication from acceptance; professional journal editor/academic editor (or some other feature(s) of the editorial make-up); does the journal tweet, y/n?. I imagine it would be very difficult if not impossible to gather accurate data on all such features, but they are things that I would advise authors to investigate. 


“For a single article, the number of times it is abstracted is also a statistically significant predictor”

“secondary publication sources have a predictive effect”

There is a great deal of potential for altmetrics, when it comes to researching individual, newly published articles.

“By the time an article is a few months old, we can make good predictions of its likelihood of future citations – especially for those articles which end up being highly cited.”

“…. only about 20% of the papers which ended up being highly cited were not predicted to be that way.” If you compare that to my analogy with contraceptives earlier, this percentage seems rather unimpressive, but then how sure do you need to be, in this circumstance? Well, that depends on how you want to use the data. The author points out “these measures are not well-suited for an editorial board to choose articles, since the Venue would be constant and they could not look at the author’s publication rank.” (because the authorship would not be revealed.)

Note that the prediction rate claimed is only for articles within the data-set being analysed, and that the author of this article says “it is not safe to predict the accuracy any new study might achieve”. but also that the trends seem clear

It is those trends that I would point out to researchers who are choosing how & where to publish. 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s