Snowy Stockholm and Nordic Librarians!

Picture from Twitter
Picture from Twitter @Micha2508

Last week I attended Elsevier’s Nordic Library Connect event in Stockholm, Sweden. I presented the metrics poster / card and slide set that I researched for Elsevier already. It’s a great poster but the entire set of metrics take some digesting. Presenting them all as slides in around 30 minutes was not my best idea, even for an audience of librarians! The poster itself was popular though, as it is useful to keep on the wall somewhere to refer to, to refresh your knowledge of certain metrics:

https://libraryconnect.elsevier.com/sites/default/files/ELS_LC_metrics_poster_V2.0_researcher_2016.pdf

I reflected after my talk that I should probably have chosen a few of the metrics to present, and then added more information and context, such as screen captures of where to find these metrics in the wild. It was a very useful experience, not least because it gave me this idea, but also because I got to meet some lovely folks who work in libraries in the Scandinavian countries.

UPDATE 23 Nov 2016: now you can watch a video of my talk (or one of the others) online.

I met these guys... but also real people!
I met these guys… but also real people!

I particularly valued a presentation from fellow speaker, Oliver Renn of ETH, Zurich. He has obviously built up a fantastic relationship with the departments that his library serves. I thought that the menus he offered were inspired. These are explained in the magazine that he also produces for his departments: see p8 of this 2015 edition.

See tweets from the event by clicking on the hashtag in this tweet:

 

Reflections and a simple round-up of Peer Review Week 2016

It has been Peer Review Week this week: I’ve been watching the hashtag on Twitter with interest (and linked to it in a blogpost for piirus.ac.uk) and on Monday I attended a webinar called “Recognising Review – New and Future Approaches or acknowledging the Peer Review Process”.

I do like webinars, as I’ve blogged before: professional development/horizon scanning from my very own desktop! This week’s one featured talks from Paperhive and Publons, amongst others, both of which have been explored on this blog in the past. I was particularly interested to hear that Publons are interested in recording not only peer review effort, but also editorial contributions. (Right at the end of the week this year, there have been suggestions that editorial work be the focus of next year’s peer review week so it seems to me that we’ve come full circle.) A question from the audience raised the prospect of a new researcher metric based on peer review tracking. I guess that’s an interesting space to watch!

I wondered where Peer Review Week came from: it seems to be a publisher initiative if Twitter is anything to go by: the hashtag is dominated by their contributions. On Twitter at least, it attracted some publisher criticism: if you deliberately look at ways to recognise peer review then some academics are going to ask whether it is right for publishers to profit so hugely from their free work. Some criticisms were painful to read and some were also highly amusing:

There were plenty of link to useful videos, webpages and infographics about how to carry out peer review, both for those new to it and for those already experienced, such as:

(On this topic, I thought that an infographic from Elsevier about reasons why reviewers refused to peer review was intriguing.)

Advice was also offered on how / how not to respond to peer reviews. My favourite:

And there were glimpses of what happens at the publisher or editor level:

There wasn’t much discussion of the issue of open vs blind or double blind peer review, which I found interesting because recognition implies openness, at least to me. And there was some interesting research reported on in the THE earlier this month, about eliminating gender bias through double blind reviews, so openness in the context of peer review is an issue that I feel torn about. Discussion on Twitter seemed to focus mostly on incentives for peer review, and I suppose recognition facilitates that too.

Peer Review Week has also seen one of the juiciest stories in scholarly communication: fake peer reviews! We’ve been able to identify so much dodgy practice in the digital age, from fake papers and fake authors to fake email addresses so that you can be your own peer reviewer and citation rings. Some of this is, on one level, highly amusing: papers by Maggie Simpson, or a co-author who is, in fact your cat. But on another level it is also deeply concerning, and so it’s a space that will continue to fascinate me because it definitely looks like a broken system: how do we stick it all together?

Explaining the g-index: trying to keep it simple

For many years now, I’ve had a good grip on what the h-index is all about: if you would like to follow this blogpost all about the g-index, then please make sure that you already understand the h-index. I’ve recently had a story published with Library Connect, which elaborates on my user-friendly description of the h-index. There are now many similar measures to the h-index, some of which are simple to understand like the i10-index, which is just the number of papers you have published which have had 10 or more citations. Others are more difficult to understand, because they attempt to something more sophisticated, and perhaps they actually do a better job than the h-index alone: it is probably wise to use a few of them in combination, depending on your purpose and your understanding of the metrics. If you enjoy getting to grips with all of these measures then there’s a paper reviewing 108 author-level bibliometric indicators which will be right up your street!

If you don’t enjoy these metrics so much but feel that you should try to understand them better, and you’re struggling, then perhaps this blogpost is for you! I won’t even think about looking at the algorithms behind Google PageRank inspired metrics, but the g-index is one metric that even professionals who are not mathematically minded can understand. For me, understanding the g-index began with the excellent Publish or Perish website and book, but even this left me frowning. Wikipedia’s entry was completely unhelpful to me, I might add.

In preparation for a recent webinar on metrics, I redoubled my efforts to get the g-index into a manageable explanation. On the advice of my co-presenter from the webinar, Andrew Plume, I went back to the original paper which proposed the g-index: Egghe, L., “Theory and practice of the G-index”. Scientometrics, vol. 69, no. 1, (2006), pp. 131–152

Sadly, I could not find an open access version, and even when I read this paper, it is peppered with precisely the sort of formulae that make librarians like me want to run a mile in the opposite direction! However, I found a way to present the g-index at that webinar, which built nicely on my explanation of the h-index. Or so I thought! Follow-up questions from the webinar showed where I had left gaps in my explanation and so this blogpost is my second attempt to explain the g-index in a way that leaves no room for puzzlement.

I’ll begin with my slide from the webinar:

g-index

 

I read out the description at the top of the table, which seems to make sense to me. I explained that I needed the four columns to calculate the g-index, reading off the titles of each column. I explained that in this instance, the g-index would be 6… but I neglected to say that this is because this is the last row on my table where the total number of citations (my right hand column) is higher than or equal to the square of g.

Why did I not say this? Because I was so busy trying to explain that we can forget about the documents that have had no citations… oh dear! (More on those “zero cites” papers later.) In my defence, this is exactly the same as saying that the citations received altogether must be at least g squared, but when presenting something that is meant to be de-mystifying, the more descriptions, the better! So, again: the g-index in my table above is the document number (g) where the total number of citations is greater than or equal to the square of g (also known as g squared).

Also on reflection, for the rows where there were “0 cites” I should also have written “does not count” instead of “93” in the “Total number of citations” column, as people naturally asked afterwards why the g-index of my Professor X was not 9. In my presentation I had tried to explain what would happen if the documents with 0 citations had actually had a citation each, which would have yielded a g-index of 9, but I was not clear enough. I should have had a second slide to show this:

extra g-index

Here we can see that the g-index would be 9 because the 9th row has the total number of citations as higher than g squared, but in the 10th row the total number of citations are less than g squared.

My “0 cites” was something of a complication and a red herring, and yet it is also a crucial concept. Because there are many, many papers out there with 0 citations, and so there will be many researchers with papers that have 0 citations.

I also found, when I went back to that original paper by Egghe, that it has a “Note added in proof” which describes a variant where papers with zero citations, or indeed fictitious papers are included in the calculation, in order to provide a higher g-index score. However I have not used the variant. In the original paper Egghe refers to “T” which is the total number of documents, or as he described it “the total number of ever cited papers”. Documents that have never been cited cannot be part of “T” and that’s why my explanation of the g-index excludes those documents with 0 citations. I believe that Egghe used this as a feature of the h-index which he valued, i.e. representing the most highly cited papers in the single number, which is why I did not use the variant.

However, others have used the variant in their descriptions of the g-index and the way they have calculated it in their papers, especially in more recent papers that I’ve come across, so this confuses our understanding of exactly what the g-index is. Perhaps that’s why the Wikipedia entry talks about an “average” because the inclusion of fictitious papers does seem to me more like calculating an average. No wonder it took me such a long time to feel that I understood this metric satisfactorily!

My advice is: whenever you read about a g-index in future, be sure that you understand what is included in “T“, i.e. which documents qualify to be included in the calculation. There are at least three possibilities:

  1. Documents that have been cited.
  2. Documents that have been published but may or may not have been cited.
  3. Entirely fictitious documents that have never been published and act as a kind of “filler” for rows in our table to help us see which “g squared” is closest to the total number of citations!

I say “at least” because of course these documents are the ones in the data set that you are using, and there will also be variability there: from one data set to another and over time, as data sets get updated. In many ways, this is no different from other bibliometric measures: understanding which documents and citations are counted is crucial to understanding the measure.

Do I think that we should use the variant or not? In Egghe’s Note, he pointed out that it made no difference to the key finding of his paper which explored the works of prestigious authors. I think that in my example, if we want to do Professor X justice for the relatively highly cited article with 50 cites, then we would spread the total of citations out across the documents with zero citations and allow him a g-index of 9. That is also what the g-index was invented to do, to allow more credit for highly cited articles. However, I’m not a fan of counting fictitious documents. So I would prefer that we stick to a g-index where “T” is “all documents that have been published and which exist in the data set, whether or not they have been cited.” So not my possibility no. 1 which is how I actually described the g-index, and not my possibility no. 3 which is how I think Wikipedia is describing it. This is just my opinion, though… and I’m a librarian rather than a bibliometrician, so I can only go back to the literature and keep reading.

One final thought: why do librarians need to understand the g-index anyway? It’s not all that well used, so perhaps it’s not necessary to understand it. And yet, knowledge and understanding of some of the alternatives to the h-index and what they are hoping to reflect will help to ensure that you and the people who you advise, be they researchers or university administrators, will all use the h-index appropriately – i.e. not on its own!

Note: the slides have been corrected since this blogpost was first published. Thanks to the reader who helped me out by spotting my typo for the square of 9!

12 reasons scholars might cite: citation motivations

I’m sure I read something similar about this once,  and then couldn’t find it again lately… so here is my quick list of reasons why researchers might cite. It includes “good” and “bad” motivations, and might be useful when considering bibliometric indicators. Feel free to comment on this post and suggest more possible motivations. Or indeed any good sources!

  1. Set own work in context
  2. Pay homage to experts
  3. Give credit to peers
  4. Criticise/correct previous work (own or others)
  5. Signposting under noticed work
  6. Provide further background reading
  7. Lend weight to own claims
  8. Self citations to boost own bibliometric scores and/or signpost own work
  9. Boost citations of others as part of an agreement
  10. Gain favour with journal editor or possible peer reviewers by citing their work
  11. Gain favour by citing other papers in the journal of choice for publication
  12. Demonstrate own wide reading/knowledge

Is this research article any good? Clues when crossing disciplines and asking new contacts.

As a reader, you know whether a journal article is good or not by any number of signs. Within your own field of expertise, you know quality research when you see it: you know, because you have done research yourself and you have read & learnt lots about others’ research. But what about when it’s not in your field of expertise?

Perhaps the most reliable marker of quality is, if the article has been recommended to you by an expert in the field. But if you find something intriguing for yourself that is outside of your usual discipline, how do you know if its any good? It’s a good idea to ask someone for advice, and if you know someone already then great, but if not then there’s a lot you can do for yourself, before you reach out for help, to ensure that you strike a good impression on a new contact.

Librarians teach information skills and we might suggest that you look for such clues as:

  1. relevance: skim the article: is it something that meets your need? – WHAT
  2. the author(s): do you know the name: is it someone whose work you value? If not, what can you quickly find out about them, eg other publications in their name or who funds their work: is there a likely bias to watch out for? – WHO & WHY 
  3. the journal title/publisher: do you already know that they usually publish high quality work? Is it peer reviewed and if so, how rigorously? What about the editorial board: any known names here? Does the journal have an impact factor? Where is it indexed: is it in the place(s) that you perform searches yourself? – WHERE 
  4. date of publication: is it something timely to your need? – WHEN
  5. references/citations: follow some: are they accurate and appropriate? When you skim read the item, is work from others properly attributed & referenced? – WHAT
  6. quality of presentation: is it well written/illustrated? Of course, absolute rubbish can be eloquently presented, and quality research badly written up. But if the creators deemed the output of high enough value for a polished effort, then maybe that’s a clue. – HOW
  7. metrics: has it been cited by an expert? Or by many people? Are many reading & downloading it? Have many tweeted or written about it (altmetrics tools can tell you this)? But you don’t always follow the crowd, do you? If you do, then you might miss a real gem, and isn’t your research a unique contribution?! – WHO

I usually quote Rudyard Kipling at this point:

I keep six honest serving-men
(They taught me all I knew);
Their names are What and Why and When
And How and Where and Who.

So far, so Library school 101. But how do you know if the research within is truly of high quality? If most published research findings are false, as John Ioannides describes, then how do you separate the good from the bad research?

An understanding of the discipline would undoubtedly help, and speed up your evaluation. But you can help yourself further, partly in the way you read the paper. There are some great pieces out there about how to read a scientific paper, eg from Natalia Rodriguez.

As I read something for the first time, I look at whether the article sets itself in the context of existing literature and research: Can you track and understand the connections? The second thing I would look at is the methodology/methods: have the right ones been used? Now this may be especially hard to tell if you’re not an expert in the field, so you have to get familiar with the methodology used in the study, and to think about how it applies to the problem being researched. Maybe coming from outside of the discipline will give you a fresh perspective. You could also consider the other methodologies that might have applied (a part of peer review, for many journals). I like the recommendation from Phil Davis in the Scholarly Kitchen that the methodology chosen for the study should be appropriate or persuasive.

If the chosen methodology just doesn’t make sense to you, then this is a good time to seek out someone with expertise in the discipline, for a further explanation. By now you will have an intelligent question to ask such a contact, and you will be able to demonstrate the depth of your own interest. How do you find a new contact in another discipline? I’ll plug Piirus here, whose blog I manage: it is designed to quickly help researchers find collaborators, so you could seek contacts & reading recommendations through Piirus. And just maybe, one day your fresh perspective and their expertise could lead to a really fruitful collaboration!

Keeping up to date with bibliometrics: the latest functions on Journal Citation Reports (InCites)

I recently registered for a recent free, live, online training session on the latest functions of Journal Citation Reports (JCR) on InCites, from Thomson Reuters (TR). I got called away during the session, but the great thing is that they e-mail you a copy so you can catch up later. You can’t ask questions, but at least you don’t miss out entirely! If you want to take part in a session yourself, then take a look at the Web of Science training page. Or just read here to find out what I picked up and reflected on.

At the very end of the session, we learnt that 39 journal titles have been supressed in the latest edition. I mention it first because I think it is fascinating to see how journals go in and out of the JCR collection, since having a JCR impact factor at all is sometimes seen as a sign of quality. These supressed titles are suspended and their editors are informed why, but it is apparently because of either a high self-cite rate, or something called “stacking”, whereby two journals are found to be citing each other in such a way that they significantly influence the latest impact factor calculations. Journals can come out of suspension, and indeed new journals are also added to JCR from year to year. Here are the details of the JCR selection process.

The training session began with a look at Web of Science: they’ve made it easier to see JCR data when you’re looking at the results of a Web of Science search, by clicking on the journal title: it’s good to see this link between TR products.

Within JCR, I like the visualisation that you get when you choose a subject category to explore: this tells you how many journals are in that category and you can tell the high impact factor journals because they have larger circles on the visualisation. What I particularly like though, is the lines joining the journals: the thicker the line, the stronger the citing relationship between the journals joined by that line.

It is the librarian in me that likes to see that visualisation: you can see how you might get demand for journals that cite each other, and thus get clues about how to manage your collection. The journal profile data that you can explore in detail for an individual journal (or compare journal titles) must also be interesting to anyone managing a journal, or indeed to authors considering submitting to a journal. You can look at a journal’s performance over time and ask yourself “is it on the way up?” You can get similar graphs on SJR, of course, based on Elsevier’s Scopus data and available for free, but there are not quite so many different scores on SJR as on JCR.

On JCR, for each journal there are new “indicators”, or measures/scores/metrics that you can explore. I counted 13 different types of scores. You can also explore more of the data behind the indicators presented than you used to be able to on JCR.

One of the new indicators is the “JIF percentile”. This is apparently introduced because the quartile information is not granular or meaningful enough: there could be lots of journals in the same quartile for that subject category. I liked the normalised Eigenfactor score in the sense that the number has meaning at first glance: higher than 1 means higher than average, which is more meaningful than a standard impact factor (IF). (The Eigenfactor is based on JCR data but not calculated by TR. You can find out more about it at Eigenfactor.org, where you can also explore slightly older data and different scores, for free.)

If you want to explore more about JCR without signing up for a training session, then you could explore their short video tutorials and you can read more about the updates in the JCR Help file.

Alerts are really helpful and (alt)metrics are interesting but academic communities are key to building new knowledge.

Some time ago, I set Google Scholar to alert me if anyone cited one of the papers I’ve authored. I recommend that academic authors should do this on Scopus and Web of Science too. I forgot all about it until yesterday, when an alert duly popped into my e-mail.

It is gratifying to see that someone has cited you (& perhaps an occasional reminder to update the h-index on your CV), but more importantly, it alerts you to papers in your area of interest. This is the paper I was alerted to:

José Luis Ortega (2015) Relationship between altmetric and bibliometric indicators across academic social sites: The case of CSIC’s members. Journal of Informetrics, Volume 9, Issue 1, Pages 39-49 doi: 10.1016/j.joi.2014.11.004

I don’t have a subscription to ScienceDirect so I couldn’t read it in full there, but it does tell me the research highlights:

  • Usage and social indicators depend on their own social sites.
  • Bibliometric indices are independent and therefore more stable across services.
  • Correlations between social and usage metrics regarding bibliometric ones are poor.
  • Altmetrics could not be a proxy for research evaluation, but for social impact of science.

and of course, the abstract:

This study explores the connections between social and usage metrics (altmetrics) and bibliometric indicators at the author level. It studies to what extent these indicators, gained from academic sites, can provide a proxy for research impact. Close to 10,000 author profiles belonging to the Spanish National Research Council were extracted from the principal scholarly social sites: ResearchGate, Academia.edu and Mendeley and academic search engines: Microsoft Academic Search and Google Scholar Citations. Results describe little overlapping between sites because most of the researchers only manage one profile (72%). Correlations point out that there is scant relationship between altmetric and bibliometric indicators at author level. This is due to the almetric ones are site-dependent, while the bibliometric ones are more stable across web sites. It is concluded that altmetrics could reflect an alternative dimension of the research performance, close, perhaps, to science popularization and networking abilities, but far from citation impact.

I found a fuller version of the paper on Academia.edu and it is indeed an interesting read. I’ve read other papers that look specifically at altmetric and bibliometric scores for one particular journal’s articles, or articles from within one discipline. I like the larger scale of this study, and the conclusions make sense to me.

And my paper that it cites? A co-authored one that Brian Kelly presented at the Open Repositories 2012 conference.

Kelly, B., & Delasalle, J. (2012). Can LinkedIn and Academia.edu Enhance Access to Open Repositories? In: OR2012: the 7th International Conference on OpenRepositories, Edinburgh, Scotland.

It is also a paper that is on Academia.edu. I wonder if that’s partly why it was discovered and cited? The alt- and biblio-metrics for that paper are not likely to be high (I think of it as embryonic work, for others to build on), but participation in an online community is still a way to spread the word about what you’ve investigated and found, just like attending a conference.

Hence the title of this blog post. I find the alert useful to keep my knowledge up to date, and the citation gives me a sense of being part of the academic community, which is why I find metrics so interesting. What they tell the authors themselves is of value, beyond any performance measurement or quality signalling aspects.

Attention metrics for academic articles: are they any use?

Why do bibliometrics and altmetrics matter? They are sometimes considered to be measures of attention (see a great post on the Scholarly Kitchen about this), and they attract plenty of attention themselves in the academic world, especially amongst scholarly publishers and academic libraries.

Bibliometrics are mostly about tracking and measuring citations between journal articles or scholarly publications, so they are essentially all about attention from the academic community. There are things that an author can do in order to attract more attention and citations. Not just “gaming the system” (see a paper on Arxiv about such possibilities) but by reaching as many people as possible, in a way that speaks to them as being relevant to their research and thus worthy of a citation.

Citation, research and writing and publishing practices are evolving: journal articles seem to be citing more other papers these days (well, according to a Nature news item, that’s the way to get more cited: it’s a cycle), and researchers are publishing more journal articles (wikipedia has collated some stats) and engaging in collaborative projects (see this Chronicle of Higher Ed article). If researchers want to stay in their “business” then they will need to adapt to current practices, or to shape them. That’s not easy when it comes to metrics about scholarly outputs, because the ground is shifting beneath their feet. What are the spaces to watch?

How many outputs a researcher produces and in which journal titles or venues matter in the UK, because of the RAE and REF excercises, and the way University research is funded there.

Bibliometrics matter to Universities because of University rankings. Perhaps such rankings should not matter, but they do, and the IoE London blog has an excellent article on the topic. So researchers need to either court each others’ attention and citations, or else create authoritative rankings that don’t use bibliometrics.

Altmetrics represent new ways of measuring attention, but they are like shape-shifting clouds in comparison with bibliometrics. We’re yet to ascertain which measures of which kinds of attention, in which kinds of objects, can tell us what exactly. My own take on altmetrics is that context is the key to using them. Many people are working to understand altmetrics as measures and what they can tell us.

Attention is not a signifier of quality (As researchers well know: Carol Tenopir has done a lot of research on researchers’ reading choices and habits). Work which merits attention can do so for good or bad reasons. Attention can come from many different sources, and can mean different things: by measuring attention exchanges, we can take account of trends within different disciplines and timeframes, and the effect of any “gaming” practices.

Attention from outside of the academic community has potential as “impact”. Of course, context is important again, and for research to achieve “impact” then you’ll need to define exactly what kind of impact you intend to achieve. If you want to reach millions of people for two seconds, or engage with just one person whose life will be hugely enriched or who will have influence over others’ lives, then what you do achieve impact or how you measure your success will be different. But social media and the media can play a part in some definitions of impact, and so altmetrics can help to demonstrate success, since they track attention for your article from these channels.

Next week I’ll be sharing two simple, effective stories of twitter use and reporting on its use.