Of the many recommendations that NSERC received in 2007 from both the internal and international reviews, they picked the one that asks, “to separate the process of assessing scientific or engineering merit from assigning funding”.
NSERC’s staff concluded, “For doing so, two principles were fundamental:
- First, that the level of a grant should be commensurate with scientific or engineering merit,
- Second, that within a given discipline group, proposals with similar scientific merit should have similar grant levels regardless of the applicant’s granting history with NSERC.”
They went ahead and devised a new evaluation system, the “binning” system, which was supposed to ensure that such principles are applied.
Here is a brief description, at least of the one applied in the Feb 2010 competition. Applicants are rated under 3 criteria:
EoR = Excellence of researcher
MoP = Merit of Proposal
HQP = Training of HQP.
There is a 6 point rating scale:
Exceptional (6 points)
Very Strong (4)
The expectation is that most researchers will get ‘S’. An ‘I’ on any criterion is likely to mean the grant is not funded, as does a ‘M’ on EoR. Once these applications have been removed, the 3 scores are added, giving between 18 and 3 points.
Each possible total score is called a funding bin, labeled A (for EEE), B (for EEO), C (for EOO) etc… all the way to P (for III). Everyone in the same bin gets the same grant. (Well… there are minor perturbations due to ‘cost of research’ and also slightly different rules for first time applicants).
Under the new system the panel decides the ratings. The group chairs then recommend a ‘bin to funding map’, but the final decision is made by NSERC.
A. Let me first address the issue of: “within a given discipline group, proposals with similar scientific merit should have similar grant levels…”
Fair enough, but why stop at “within a given discipline” now that we have a binning system. Why not “within disciplines with comparable funding needs”? Why do we see a EEE bin in some disciplines awarded more than 100K than their colleagues in the same bin in other “poorer” disciplines?
The first rule of fairness in devising a system (even the stock market) is to prevent people from “gaming the system”. Yet, the current NSERC system gives incentives to people to apply through the “cash richer” disciplines. How many mathematicians have gone (or trying to go) through Computer Science, Mechanical Engineering, the life sciences, or elsewhere, where the average grants are higher?
NSERC has been saying for the last 2 years that they are planning to ask the Council of Canadian Academies to review the funding envelopes of the various disciplines. This is welcome news, but why is it taking so long to start such a review?
B. Let me address the fundamental principle that “the level of a grant should be commensurate with scientific or engineering merit.”
The new system is anything but capable of ensuring that such a principle holds. As scientists know, many outstanding researchers cannot write a decent proposal, therefore failing in the “Merit of Proposal” category. How many Fields medalists have no student whatsoever, which means they would fail (getting an I) on the HQP category, and kiss any funding goodbye. As many people argued, Isaac Newton could not get a discovery grant under the new system. But think of the hundreds and thousands of researchers that follow their lead and their breakthroughs. Is this mentoring or what? The required system is one where the global scientific merit of a researcher is taken into consideration, without being irreversibly penalized by a particular shortcoming in one of 3 almost randomly chosen categories.
C. I say that a fair review process is one that:
- minimizes randomness
- maximizes uniformity in the standards of a given competition
- minimizes the possibility of gaming the system
The new system breaks each one of these rules. How?
1. Increased randomness:
- The simple fact that only 3 categories are used to “bin” a scientist, already creates an unacceptable level of volatility. One way to mitigate risk is to diversify (i.e., increase the number of categories) so that one flawed decision in one category doesn’t have a dramatic effect on the outcome. Many panelists told me that they were surprised by this or that decision, once the “binning” was complete and the funding for each bin was announced.
- Another factor of randomness is the small size of the voting sub-committee. A panel of 5 does not have the same expertise as a panel of 12. Well aware of this, and for added expertise, fairness and accountability, I implemented –a while ago when I was a chair of a GSC– a process whereas for any returning grant above 35 K (quite large for math), all panelists on both GSC336 and 337 (a total of 20 people) had to get together and decide collectively. This is needed even more now, considering the exponential rate at which disciplines are growing and branching out. Plus, once you eliminate the conflicted panelists (e.g., co-authors and university colleagues), the level of expertise becomes very thin. I have 30 people on the BIRS scientific panel, yet we often find ourselves with a blind spot vis-à-vis various proposals. The system then becomes very sensitive to the level of expertise on the panel, a level that cannot be guaranteed, in view of the current practices of NSERC for selecting panelists.
- The randomness is exacerbated by the voting system. While 5 people vote, only the median vote counts. In other words, a vote of EESSS is the same as SSSII. Moreover, one person’s vote can move an applicant from one bin to another (e.g., making a 4K difference in Math) or from fundable to unfunded (making a difference between no grant and the minimum grant, not to mention the dramatic effect on someone’s career). By fragmenting the decision making, and by dissociating the funding from the merit, panelists end up not knowing the consequence of their own votes, albeit on the “binning” or on the funding.
2. Lack of uniformity
It should be obvious that the primary effect of the “conference model” is that different subsets of 5 panelists could use different standards in their evaluations. This is exacerbated when panelists come from different GSCs that have totally different cultures.
This was for example evident from the discrepancy between the numbers of appeals among applied mathematicians vs. pure mathematicians, where the panelists are more uniform in their scientific culture.
In the old system, all members of a GSC read, evaluated and voted on all the files. Moreover, once finished, one often felt the need to come back to the files which were evaluated early in the week, to see whether the standards had changed as the week progressed. There is no similar option in the current system.
Most telling is the case of a husband and wife (in a nameless discipline), where the wife was awarded more than double the amount of her husband, even though most of their papers were joint, and with a slight advantage to the husband –at least in the last 6 years– in terms of recognition (awards, invited lectureships, etc…). When the husband made this comparison the basis of his appeal, the external reviewer admitted that the only discrepancy must have originated from a possibility that the wife was going through a maternity leave. The husband’s appeal was denied, even after attesting (he should know) that his wife hasn’t been pregnant in 20 years.
I say that the correct reason for this discrepancy (that the external reviewer should have picked on) is that the 2 applicants were evaluated by different subsets of panelists, hence the lack of uniformity in the standards.
Another trigger for the lack of uniformity is of course due to certain instructions originating with NSERC’s staff. Instructions that are neither definable nor implementable in a uniform way. When a recommendation states, “proposals with similar scientific merit should have similar grant levels regardless of the applicant’s granting history with NSERC”, this does not mean that the applicant’s scientific history should be totally ignored.
This doesn’t mean either, that panelists should only use the information contained in the proposer’s application, and not other widely available scientific fact about the applicant. This is simply a practice that is impossible to implement, and again it creates more lack of uniformity, since some panelists may apply it religiously, and some not at all.
3. Gaming the system and HQP
I mentioned above an example of gaming the system based on the differences in funding levels among similar GSCs. Another way, was of course the HQP category. Should one be surprised to see departments where the norm seems to be that each post-doc is essentially “shared” by 6 faculty members? Isn’t it the easiest way to improve one’s chances in the HQP category?
Which brings us to the question. Why is it that HQP is one of the 3 pillars of the binning system? As I said before, several outstanding researchers do not have many students. Luckily some GSCs tried and managed to go around this hurdle, sparing themselves and NSERC public embarrassment. On the other hand, this again created a lack of uniformity. After all, why should someone with 22 HQPs, and supervising 6 PhD students (2 of them ending up as Szego assistant professors at Stanford, and others at Minnesota, Toronto, etc…) should get the same grade as someone with less than half of that?
This was again exacerbated by the confusion on how to count. Should post-docs count less (because of gaming)? Should undergrads count seriously so as to keep smaller institutions in the game? Should one then train in every one of the 4 categories within HQP in order to get an E?
Again, another trigger for the lack of uniformity. But this is mostly due to the fact that HQP was considered as one the 3 major categories, as opposed to having it as one of many more categories (such as national impact, international impact, quality of publications, etc…). Another suggestion was to use the HQP to modulate the funding as opposed to its role in the “binning”. As we said before many Fields medalists have no students. Neither did Isaac Newton.
I wonder about the large discrepancy in the funding needs of various disciplines. NSERC is proud of the fact that more than 50% of the Discovery Grant funding is spent on HQP. Presumably the cost to support a graduate student or a post doc is roughly similar across all of NSERC disciplines. Yet we have very large differences in the funding provided to different GSC’s.
When I did an analysis of the data (from 2007 and 2008) they showed that the average Math and Stats grant was 18K while the average of all the other GCS’s was 30K. Since at least 15K of this 30K was used for HQP I conclude that something is wrong. I presume (hope) that things have changed in this respect since 2008 but I doubt they have changed much.
Pingback: Expert panel to examine and tell us what others do! | Piece of Mind
Pingback: Empowering knowledge and informed consent (I) | Piece of Mind
NSERC requests that applicants choose the Evaluation Group whose expertise most closely matches the research topic. For most applications the expertise exists within one Evaluation Group. For applications that cross the boundaries of traditional disciplines the Chairs of the two or more relevant Evaluation Groups will identify members whose expertise will contribute best to an effective review. This should be done on the basis of the scientific or engineering objectives.
Funding levels for each Evaluation Group generally reflect the historical costs associated with specific academic disciplines. NSERC also specifies a discipline-specific minimum Discovery Grant amount to ensure that any funded researcher can support at least one graduate student, or, in the case of institutions without graduate programs, two undergraduate students.
NSERC’s Peer Review Manual provides Evaluation Group members with a complete set of guidelines for assessing applications fairly and objectively. This includes directing reviewers to consider the applicant’s track record for the previous six years, to use only the information provided in the application, and to take into account all levels of training when assessing the quality and impact of training HQP (not simply the number of HQP trained). Where necessary, Section Chairs and Group Chairs of each Evaluation Group remind members of these requirements during the competition. For the vast majority of applications, Evaluation Group members reach a consensus on an application’s rating prior to voting. Therefore, the types of extreme cases being suggested (SSSII or SSSEE) effectively amount to hypothetical scenarios.
NSERC’s request for the Council of Canadian Academies to conduct an assessment on performance indicators for basic research was approved by the Minister of Industry in the fall of 2010. The assessment will examine approaches used to evaluate research performance and indicators that enable comparisons across areas of research. NSERC plans to use these indicators, both quantitative and qualitative, to develop a new method of allocating funds among Evaluation Groups that could be implemented in 2013.
Pingback: NSERC responds (What took you so long?) | Piece of Mind
Pingback: No students? Don’t bother apply for Discovery Grant | Piece of Mind
Pingback: If the “Binning” of Canada’s scientists is here to stay, then here is a way to fix it! | Piece of Mind
Pingback: NSERC’s Discovery Grants: The numbers by discipline | Piece of Mind
Pingback: Psst! Pass it on! Concerns about NSERC’s ways | Piece of Mind
Pingback: Assessing Science is hard! NSERC bureaucrats should know it, but then so do we! | Piece of Mind
Pingback: Could the research community cost Harper a majority government? | Piece of Mind
Pingback: Accountability may be the biggest casualty of NSERC’s new ways at Discovery | Piece of Mind