Principles for the responsible use of research metrics at the University of Bradford

Introduction

The University of Bradford recognises that the use of individual research metrics (such as citation rates, h-indices, Altmetrics and Journal Impact Factors) does not alone provide a reliable indicator of research quality or impact, can be discriminatory and is not easily subject to robust analysis. We are committed to the responsible assessment of research which includes, but does not solely rely on, the responsible use of research metrics.
In accordance with the second recommendation of The Metric Tide, the University of Bradford has developed this statement of principles on their approach to research management and assessment, including use of quantitative indicators.
We are signatories of the San Francisco Declaration on Research Assessment (DORA) and support the principles laid out in the Leiden Manifesto for Research Evaluation.
The University of Bradford recognises that new research metrics are constantly emerging and older metrics developing. Our position on all research metrics, including newly emerging metrics, is that we should understand exactly what they measure, the context in which they developed, how they can be used responsibly and the consequences of irresponsible use.

Principles

1) Research indicators should not supplant expert peer review

(Leiden Manifesto, Principle 1)

Peer review remains the method of choice for assessing research quality, individual researcher’s contributions, and in hiring, regrading or promotion decisions. Whilst some additional contextualised quantitative indicators may be useful, these should not take precedence over expert peer assessment.

2) Research indicators used in hiring and promotion decisions should be transparent

(DORA Institutional Commitments 4 & 5, Leiden Principle 5, The Metric Tide Recommendation 4)

Where quantitative indicators have been used as part of making decisions on hiring, tenure and promotion, the criteria and context must be explicit. Peer assessment of the content of a paper is much more important than publication metrics or the identity of the journal in which it was published. Consideration should be made of other output types, such as software and datasets as part of the assessment of individual researchers. Those who are being evaluated should be able to check their outputs and associated indicators have been correctly identified.

3) Journal Impact Factor (JIF) or similar journal level metrics should not be used as proxy measure for the quality of individual research outputs

(DORA General Commitment1, The Metric Tide Recommendation 5 & 8)

Do not use journal level metrics, such as Journal Impact Factors or Academic Journal Guide, in isolation, as a surrogate measure of the quality of individual research articles. The above are indicators of journal performance (that can and may have been manipulated) not of research quality of individual outputs.

4) Account for variation in practices in different academic disciplines

(Leiden Principle 6, The Metric Tide Recommendation 5)

Citation rates vary by field: top-ranked journals in mathematics have impact factors of around 3; top-ranked journals in cell biology have impact factors of about 30. Normalised indicators are required, and the most robust normalization method is based on percentiles: each paper is weighted on the basis of the percentile to which it belongs in the citation distribution of its field (the top 1%, 10% or 20%, for example). Best practice is to choose a number of relevant indicators with the appropriate contextual data, with the awareness that different disciplines have varying norms for publications in their field. For example, peer reviewed conference papers are very common in Computer Science, whilst Historians tend to write books and monographs that are not indexed in bibliometric databases.

5) Base assessment of individual researchers on a qualitative judgement of their portfolio

(Leiden Principles 6 & 7)

Quantitative indicators should not be used in place of peer review of individuals’ research portfolio. For example, the h-index (Hirsch 2005), a quantitative indicator commonly used to gauge the productivity and impact of individual’s research outputs is problematic. The older a researcher is, the higher the h-index, even in the absence of new papers, making the use of h-index problematic for early career researchers, for example. The h-index varies considerably between disciplines, as it relies on databases containing peer reviewed journal articles. Those disciplines who tend to publish books or monographs will have significantly lower h-indices, as their outputs are not being included in the calculation of h-index. The h-index is also database dependent, meaning that one can have a different h-index in Google Scholar compared to Scopus, for example. The h-index has also been found to reflect structural inequalities in the academy with regards to gender (e.g. Geraci et al. 2015, Carter et al. 2017) and race (Hopkins et al. 2012) and other protected characteristics. One should exercise caution and only use h-index with broader contextualised data, other metrics and alongside expert peer review. The coronavirus pandemic of 2020 has had a disproportionate affect on article submission rates of men versus women academics, with women submitting proportionally fewer articles than men (Squazzoni et al. 2020). This is likely to have an impact on women and other individual researcher’s metrics for years to come.

6) No single metric should be used in isolation

(Leiden Principles 8 & 9. The Metric Tide Recommendation 5)

Best practice is to use a number of different metrics to supplement peer review. One single metric should not be relied upon, and the limitations and caveats of the use of metrics should be appreciated by those developing methods of research assessment. A single metric will not provide the nuance required for robust evidence-based decision making. Indicators change the system through the incentives they establish. These effects should be anticipated. Using a range of appropriate indicators should help avoid gaming or manipulation of a single indicator.

7) Measure performance against the research missions of the institution, group or researcher

(Leiden Principle 2)

Institutional, faculty and school research aims should be clearly stated, and the indicators used to evaluate performance should relate clearly to those stated aims. The choice of indicators, and the ways in which they are used, should take into account wider discipline and sector contexts. Scholarship, research and innovation aims can differ between disciplines; activity that advances the frontiers of academic knowledge differs from that which is focused on delivering solutions to societal problems. Indicators may be based on merits or impact relevant to policy, industry or the public rather than on academic ideas of excellence. No single performance indicator applies to all contexts.

8) Recognise excellence in locally relevant research

(Leiden Principle 3)

The largest available databases of bibliographic data are biased towards Western, English language publications. It is imperative that this bias is acknowledged and that publications in languages other than English may be of high quality and be significantly impactful. This should be taken into account when assessing these types of research outputs.

9) Scrutinize indicators regularly and update them

(Leiden Principle 10)

Research priorities and the goals of assessment shift and the research system itself evolves. Indicator systems have to be reviewed and perhaps modified in line with best practice.