NSERC has recently adopted a new evaluation system for its Discovery Grants. It relies on a “binning system”, that is based on a fragmented decision-making process. It has been widely criticized for its volatility, its lack of uniformity, and for its negative impact on early-career scientists, as well as on universities with no or minimal graduate programs. Just in case the binning system is here to stay, we give a few easily implementable suggestions to improve the process. The use of a slightly more involved non-linear formula may be necessary to help mitigate some of the described flaws.
Here is a brief description of NSERC’s binning system, which was introduced in the last DG competition, in response to recommendations from two independent reviews. I do not know who developed the parameters of the new system, and whether it was the work of external consultants or of a committee of Canadian scholars. In either case, there is no sign that a competent mathematician or statistician was involved in the process. They would have known better!
That’s how the current system works. Applicants are rated under 3 criteria:
EoR = Excellence of researcher
MoP = Merit of Proposal
HQP = Training of HQP.
There is a 6 point rating scale:
Exceptional (6 points), Outstanding (5), Very Strong (4), Strong (3), Moderate (2), Insufficient (1).
The expectation is that most researchers will get ‘S’ (for “Strong”). An ‘I’ on any criterion is likely to mean the grant is not funded, as does a ‘M’ on EoR. Once these applications have been removed, the 3 scores are added, giving between 18 and 3 points. Each possible total score is called a funding bin, labeled A (i.e., ‘EEE’) to P (‘III’). Everyone in the same bin gets the same grant. (Well… nearly – there are minor perturbations due to ‘cost of research’ and also slightly different rules for first time applicants). Under the new system the panel decides the ratings. The group chairs then recommend a ‘bin to funding map’, but the final decision on this is made by NSERC. You can summarize the above with the following formula for a grant allocation:
Grant= L x (EoR+ MoP + HQP)
where L denotes a number determined by the budget of the discipline in question. This number is supposed to be revised (by NSERC staff) after the submission of a –recently announced — review undertaken by the Council of Canadian Academies.
Here are our recommendations. They have the advantage of retaining the binning system, while alleviating some of the flaws in its current implementation.
First, I suggest –as detailed below– to add two more criteria, for example:
General Impact of Researcher (GIR)
Researcher Potential and/or Record (RPR)
And then –inspired by a topologist– one can change the formula to:
Grant= L x (EoR+ MoP + GIR+ RPR) (1+ f(HQP))
1. Need to deal with the issue of HQP
The use of HQP – let alone how to count trainees — as a factor for denying applicants, in particular those in early careers (no HQP, thus no grant, thus no HQP…), and smaller universities (no grads, thus no grant, thus no research career) may be the worst flaw of the new system.
Our suggestion is to use the HQP to modulate the funding as opposed to its –due or die– role in the current “binning” system. The formula is a way to dampen this effect while still giving it the importance it deserves. The function $f$ is to be chosen carefully and may be a combination of functions that put different weights on undergraduate, graduate, and postgraduate training. The new formula insures that an applicant can still get funded even if his/her HQP numbers are negligeable.
2. Need to minimize the randomness and volatility of the process
- The simple fact that only 3 categories are used to “bin” a scientist creates an unacceptable level of volatility. One way to mitigate risk is to diversify (i.e., increase the number of criteria) so that an isolated flawed decision in one category doesn’t have a dramatic effect on the outcome. We therefore recommend adding two new categories:
(a) “General impact of the researcher (GIR)”, which describes invited lectureships, conference organization, special schools, journal editing, book writing and any other activity that supports research in general. After all, this also requires some research funding.
(b) “Researcher potential and record (P)”, which for an early-career applicant could force a discussion on his/her potential as well as the depth of the direction of research. For a senior researcher, it could reflect career impact (and not just the last 6 years) as well as his/her record on delivering on past proposals. It may also alleviate the new system’s lack of built-in memory for preceding granting levels.
- Increase the number for each sub-group to at least 7 panelists.
- Make sure that there is enough expertise and experience on each sub-group. The freedom of choice of non-Canadian panelists should be used to recruit stellar members.
- Rethink the voting system. Someone getting a vote of EESSS on every one of the criteria should not be in the SSS bin. There is also a correlation effect between the criteria that should be taken into consideration.
3. Maximize uniformity in the standards by mitigating the effect of the fragmentation of decision-making
One needs to address an inherent flaw in the “conference model” in that different subsets of 5 panelists could use different standards in their evaluation. This is exacerbated when panelists come from different disciplines that have totally different cultures. As a check of uniformity, we recommend that the full panel review the collection of grants before and after they are assigned to a sub-group. The initial review can be used to uniformize standards. The second general review can happen after the binning, and before the dollar amounts are assigned. A fragmented decision-making process can only create unpleasant surprises, even for the panelists, once the final ranking induced by the binning is posted.
A panelist who is quite familiar with applicants X and Y, and is surprised and alarmed by seeing that X has been ranked below Y (by different sub-groups) should have the right to flag the cases.
4. Minimize the possibility of gaming the system
- Improve evaluation of the HQP by tightening the choice of the function $f$ above.
- Ensure fair allocation of funds between the various disciplines. This depends on the outcome of the review by the CCA.
Pingback: Tweets that mention If the “binning” of Canada’s scientists is here to stay, then here is a way to fix it! | Piece of Mind -- Topsy.com
Members from different Evaluation Groups (EG) are brought together as they are needed to review applications whose research proposals cover more than one discipline. Such joint reviews involve the specific member(s) from other EGs who have the relevant expertise and can appreciate the appropriate performance indicators in that specific topic.
Before the competition sessions begin, EG members participate in discussions on the interpretation of the criteria and the rating grid (see section 6.13 of NSERC’s Peer Review Manual) that result in a common understanding of, for example, the meaning of a “strong” vs. a “very strong” or “moderate” rating. Members perform this calibration exercise with actual applications from prior competitions (with the applicants’ permission) and/or with applications that are up for review in the current competition. This is a key step in ensuring consistency within and between EGs.
The “Excellence of Researcher” selection criterion (see section 18.104.22.168 of NSERC’s Peer Review Manual) incorporates elements equivalent to the “General Impact of Researcher” and “Researcher Potential and/or Record” categories proposed above.
Pingback: The Kafkaesque grip of bureaucrats on Canada’s peer review and granting process | Piece of Mind