Explaining the g-index: trying to keep it simple

For many years now, I’ve had a good grip on what the h-index is all about: if you would like to follow this blogpost all about the g-index, then please make sure that you already understand the h-index. I’ve recently had a story published with Library Connect, which elaborates on my user-friendly description of the h-index. There are now many similar measures to the h-index, some of which are simple to understand like the i10-index, which is just the number of papers you have published which have had 10 or more citations. Others are more difficult to understand, because they attempt to something more sophisticated, and perhaps they actually do a better job than the h-index alone: it is probably wise to use a few of them in combination, depending on your purpose and your understanding of the metrics. If you enjoy getting to grips with all of these measures then there’s a paper reviewing 108 author-level bibliometric indicators which will be right up your street!

If you don’t enjoy these metrics so much but feel that you should try to understand them better, and you’re struggling, then perhaps this blogpost is for you! I won’t even think about looking at the algorithms behind Google PageRank inspired metrics, but the g-index is one metric that even professionals who are not mathematically minded can understand. For me, understanding the g-index began with the excellent Publish or Perish website and book, but even this left me frowning. Wikipedia’s entry was completely unhelpful to me, I might add.

In preparation for a recent webinar on metrics, I redoubled my efforts to get the g-index into a manageable explanation. On the advice of my co-presenter from the webinar, Andrew Plume, I went back to the original paper which proposed the g-index: Egghe, L., “Theory and practice of the G-index”. Scientometrics, vol. 69, no. 1, (2006), pp. 131–152

Sadly, I could not find an open access version, and even when I read this paper, it is peppered with precisely the sort of formulae that make librarians like me want to run a mile in the opposite direction! However, I found a way to present the g-index at that webinar, which built nicely on my explanation of the h-index. Or so I thought! Follow-up questions from the webinar showed where I had left gaps in my explanation and so this blogpost is my second attempt to explain the g-index in a way that leaves no room for puzzlement.

I’ll begin with my slide from the webinar:

g-index

 

I read out the description at the top of the table, which seems to make sense to me. I explained that I needed the four columns to calculate the g-index, reading off the titles of each column. I explained that in this instance, the g-index would be 6… but I neglected to say that this is because this is the last row on my table where the total number of citations (my right hand column) is higher than or equal to the square of g.

Why did I not say this? Because I was so busy trying to explain that we can forget about the documents that have had no citations… oh dear! (More on those “zero cites” papers later.) In my defence, this is exactly the same as saying that the citations received altogether must be at least g squared, but when presenting something that is meant to be de-mystifying, the more descriptions, the better! So, again: the g-index in my table above is the document number (g) where the total number of citations is greater than or equal to the square of g (also known as g squared).

Also on reflection, for the rows where there were “0 cites” I should also have written “does not count” instead of “93” in the “Total number of citations” column, as people naturally asked afterwards why the g-index of my Professor X was not 9. In my presentation I had tried to explain what would happen if the documents with 0 citations had actually had a citation each, which would have yielded a g-index of 9, but I was not clear enough. I should have had a second slide to show this:

extra g-index

Here we can see that the g-index would be 9 because the 9th row has the total number of citations as higher than g squared, but in the 10th row the total number of citations are less than g squared.

My “0 cites” was something of a complication and a red herring, and yet it is also a crucial concept. Because there are many, many papers out there with 0 citations, and so there will be many researchers with papers that have 0 citations.

I also found, when I went back to that original paper by Egghe, that it has a “Note added in proof” which describes a variant where papers with zero citations, or indeed fictitious papers are included in the calculation, in order to provide a higher g-index score. However I have not used the variant. In the original paper Egghe refers to “T” which is the total number of documents, or as he described it “the total number of ever cited papers”. Documents that have never been cited cannot be part of “T” and that’s why my explanation of the g-index excludes those documents with 0 citations. I believe that Egghe used this as a feature of the h-index which he valued, i.e. representing the most highly cited papers in the single number, which is why I did not use the variant.

However, others have used the variant in their descriptions of the g-index and the way they have calculated it in their papers, especially in more recent papers that I’ve come across, so this confuses our understanding of exactly what the g-index is. Perhaps that’s why the Wikipedia entry talks about an “average” because the inclusion of fictitious papers does seem to me more like calculating an average. No wonder it took me such a long time to feel that I understood this metric satisfactorily!

My advice is: whenever you read about a g-index in future, be sure that you understand what is included in “T“, i.e. which documents qualify to be included in the calculation. There are at least three possibilities:

  1. Documents that have been cited.
  2. Documents that have been published but may or may not have been cited.
  3. Entirely fictitious documents that have never been published and act as a kind of “filler” for rows in our table to help us see which “g squared” is closest to the total number of citations!

I say “at least” because of course these documents are the ones in the data set that you are using, and there will also be variability there: from one data set to another and over time, as data sets get updated. In many ways, this is no different from other bibliometric measures: understanding which documents and citations are counted is crucial to understanding the measure.

Do I think that we should use the variant or not? In Egghe’s Note, he pointed out that it made no difference to the key finding of his paper which explored the works of prestigious authors. I think that in my example, if we want to do Professor X justice for the relatively highly cited article with 50 cites, then we would spread the total of citations out across the documents with zero citations and allow him a g-index of 9. That is also what the g-index was invented to do, to allow more credit for highly cited articles. However, I’m not a fan of counting fictitious documents. So I would prefer that we stick to a g-index where “T” is “all documents that have been published and which exist in the data set, whether or not they have been cited.” So not my possibility no. 1 which is how I actually described the g-index, and not my possibility no. 3 which is how I think Wikipedia is describing it. This is just my opinion, though… and I’m a librarian rather than a bibliometrician, so I can only go back to the literature and keep reading.

One final thought: why do librarians need to understand the g-index anyway? It’s not all that well used, so perhaps it’s not necessary to understand it. And yet, knowledge and understanding of some of the alternatives to the h-index and what they are hoping to reflect will help to ensure that you and the people who you advise, be they researchers or university administrators, will all use the h-index appropriately – i.e. not on its own!

Note: the slides have been corrected since this blogpost was first published. Thanks to the reader who helped me out by spotting my typo for the square of 9!

Advertisements

3 thoughts on “Explaining the g-index: trying to keep it simple

  1. Satish May 26, 2016 / 5:34 am

    Hi Jenny!
    Good to see the blog post explaining more on g-index from your previous presentation. I hope audience will understand these indexes in a better way when both h-index and g- index are compared and explained on a single screen. A paper by Costas and Borden appeared in Scientometrics, Vol. 77, No. 2 (2008) 267–288 (Table 1) gives a clear picture. Hope, you have gone through this paper. Though they are independent indexes they are interdependent to some extent. First interest of any one would be H-index and then comes what is g-index with more curiosity or for better understanding of the performance. g-index is an effort to adjust and to improve the h-index. But, in India hardly it is (g-index) being asked by any scientist.

    • jennydelasalle May 26, 2016 / 8:19 am

      Hi Satish, thanks for your comment! The paper you mention is one of those which uses the variant of the g-index where fictitious papers are included in ‘T’. They explain that they are doing this and their table is a clear explanation of both indexes in one view, as you say. I don’t think that the g-index is widely used because it doesn’t appear in the authors’ profiles on the big citation databases, so people (such as researchers & university administrators) are not as aware of its existence. Since the h-index is readily available and you don’t have to calculate it, even if you know about the g-index then you might still prefer to use the h-index because of its availability. But it’s one of the roles that research support librarians can do, to advise on bibliometric measures that are appropriate to the purpose, which is why it is so important for us to understand them! As you say, the g-index helps to give a better understanding of performance, for those who are measuring it. Meanwhile, a researcher could display the usual h-index in their CV or online profile but also a g-index, perhaps even in the form of a table like the one in the paper that you recommend, so that readers can understand what it means.

  2. Areeg March 6, 2017 / 3:40 pm

    You are wonderful. You made it easy. thank you

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s