Experiments in evaluating teachers
Districts and charters breaking new groundTeachers of Lucia Mar Unified, near Pismo Beach, are a crack in the wall of resistance to overhauling how teachers are evaluated and rewarded for their performance. This year, teachers in seven of the district’s 17 schools voted overwhelmingly to begin using an evaluation system that combines individual and school test scores and multiple classroom evaluations by teachers and administrators, along with regular discussions about teacher effectiveness.
The system they’ll be using, TAP: The System for Teacher and Student Advancement, was developed in Santa Monica a dozen years ago by Lowell Milken and the Milken Family Foundation* and has been expanded in a number of states and districts nationwide through funding from the federal Teacher Incentive Fund. But it has never been tried in California, until now. What turned the heads of Lucia Mar teachers was traveling to where it’s in place in Texas and Louisiana, Assistant Superintendent Michelle Ellis said, and seeing that their apprehensions – animosity over unequal bonuses and unfair judgments by unqualified observers – didn’t materialize.
In Sacramento, bills to clear away obstacles to identifying effective teachers and dealing with ineffective ones are facing uncertainty this year. Republican Sen. Bob Huff’s SB 355 failed to get more than two votes in the Senate Education Committee last month. It would have given districts the option of evaluating teachers and principals by multiple measures, as long as standardized test results comprised at least 30 percent of the evaluation. It also would have allowed districts to do layoffs based on performance, not seniority.
Democratic Assemblymember Felipe Fuentes’ AB 5 would repeal the current system, under the Stull Act, and require evaluating teachers using evidence of effectiveness based on professional standards, test results (not necessarily California Standards Tests alone), and classroom observations. It’s been merged with AB 48, by Assembly Speaker John Perez, which would delay more frequent state-mandated evaluations until billions owed to the schools under Prop 98 are paid off.
Regardless of what happens at the state level, interest is stirring in a handful of districts, and there are signs that some teachers, if not union leaders, are ready for change.
- More than 900 teachers in 91 Los Angeles Unified schools have volunteered to pilot the evaluation method developed by the district’s Teacher Effectiveness Task Force. The evaluations would include feedback based on classroom observations and a teacher’s impact on student test scores (value-added measures rechristened for better branding as Academic Growth over Time data). But United Teachers Los Angeles is seeking an injunction to stop the program even though results of the pilot next year wouldn’t count.
- As I reported yesterday, LAUSD is one of seven districts in CORE (California Office to Reform Education) that plan to seek $50 million or more from Race to the Top to create, among other projects, new teacher evaluations and performance tools.
- San Juan Unified is among districts that remain committed to Peer Assistance and Review, a promising program the state no longer funds, in which teachers and administrators serve on a review panel that evaluates and prescribes assistance for underperforming teachers. San Juan teachers hope to expand PAR into a broader evaluation system, based on multiple measures.
- Five charter organizations in Los Angeles (Aspire, ICEF, Alliance for College-Ready Schools, Green Dot, and Partnership to Uplift Communities, or PUC) covering 90 schools are in year two of a seven-year Gates Foundation grant to evaluate and reward high-performing teachers (higher pay, new pay levels). Writing the rubric on which to base observations is among the first steps of the $80 million grant, called The College Ready Promise.
- Change, although glacial, may even be coming to the California Teaches Assn. Dean Vogel, the incoming president of the CTA, cochaired the National Education Association committee whose policy paper recommends a comprehensive evaluation method. It would measure teacher practices and teacher collaboration and involvement in the betterment of the school, along with student academic growth, including “reliable, high quality standardized tests.” This position, which still must be ratified by NEA delegates this summer, marks a shift in NEA’s position. It’s a sign that national leaders recognize that they’d better engage in change or have state legislatures thrust on them evaluations based predominately on standardized test results, as has happened this year in Florida, Indiana, Wisconsin, and elsewhere.
(Note: Speakers from TAP, LAUSD’s Teacher Effectiveness Task Force, and The College Ready Promise gave presentations at a conference last month sponsored by the Education Trust-West. You can download all of the presentations here. And tomorrow, EdSource will release a study on teacher evaluations that looks at California laws governing them and districts that are trying alternatives.)
No Dispute: It’s not working now
Effective and trustworthy teacher evaluation systems will undermine the rationale for a tenure system that protects jobs based on seniority and a step-and-column pay system that ties raises to longevity. There is an emerging generational divide on these issues.
But all teachers and administrators should agree on the overriding issue: The current system, based largely on cursory classroom observations, is neither weeding out the minority of bad teachers nor helping other teachers improve. Under the 40-year-old Stull Act, some veteran teachers may be reviewed once every five years; only teachers with unsatisfactory reviews are reviewed annually – between 1 and 3 percent in Los Angeles Unified, for example.
How much to weigh test scores?
The extent, if any, to which student test scores are used will be one source of contention. TAP, The College Ready Promise, and Los Angeles Unified are using or intend to use value-added models or similar methods designed to predict student outcomes while controlling for students’ demographic characteristics. But a number of recent studies have criticized the reliability of value-added models for factors out of a teacher’s control and cautioned against heavy weighting of standardized tests, which are a narrow indication of a teacher’s effectiveness. At the same time, test results are an objective measure and can be a good diagnostic tool, telling a teacher which types of students are learning the fastest, and which teachers in a school are having the most success.
Under TAP, schoolwide value-added student test scores count for 20 percent of a teacher’s evaluation; the teacher’s individual students’ scores count for 30 percent and observations of classroom management, organization, and preparation, for the other half. Under The College Ready Promise, observations by the principal, peer critiques, and parent and student feedback count for 60 percent of an evaluation, with schoolwide, team, and student growth measures counting for 40 percent. Under the Teacher Effectiveness Task Force’s recommendations, 70 percent of evaluations in Los Angeles would be based on classroom observations, teachers’ contributions to the school community, and student and parent surveys, with 30 percent based on standardized test scores and other assessments.
Different ratios aside, those involved with the systems emphasize that the goal is not punishment but self-examination and candid examination of classroom practices and techniques – integral to being treated as professionals.
Foundations and the federal government, through three-year Teacher Incentive Fund grants – a program begun by President Bush and expanded by President Obama – are funding most of the experiments in California, including bonuses and new pay scales for master and mentor teachers under the TAP model. So sustainability long-term will be an issue.
Effective evaluations will require additional training and professional development and demand additional time by administrators and mentor teachers – this in a state that already had one of the lowest administrator/student ratios in the nation before the recession. It’s easy to criticize the current system of lax, non-demanding evaluations protected by due process rights. If the public and legislators want a comprehensive system to replace it, they should stand ready to pay for it.
* TAP is now operated by the National Institute for Excellence in Teaching.






As long as student test scores lack potent, meaningful consequences for students’ grades and academic promotion it will remain unfair to assess teacher performance using that test data.
Every year I go through the song and dance of trying to get my students to care about CST scores when my students all know that their test performance bears no consequences for them at all. I’ve even had students make flower patterns on their bubble sheets because they thought the tests were meaningless.
The single most powerful thing we can do to raise student test scores statewide would be to tie those scores to academic promotion. Make course grades and advancement to the next course contingent upon satisfactory performance on end of course tests (ie- CST scores) and we will see a miraculous increase in student test scores statewide.
Only after students have such powerful incentives to do their best on state tests will it be ethical to evaluate teacher performance using such test scores.
It may be possible to design statistical models that correct for socio-economic factors that influence student academic performance, but there will never be a statistical model that corrects for lack of student motivation to perform well on state tests.
Report this comment for abusive language, hate speech and profanity
What’s curious about these proposed evaluation systems is that they rarely, if ever, solicit feedback from the people who are most impacted by a teacher’s skills and abilities — the students and parents. How sensible is it to evaluate public-facing team members based solely on internal factors like feedback from managers and colleagues? The reliance (over-reliance in many cases) on test scores in teacher evaluations seems to be a primitive attempt to get some kind of “objective” third-party feedback into the system; however, it can easily be misguided. I know there are “test-haters” and “test-lovers” on these blogs, but I suspect that many people familiar with education have a more moderate view: test scores can reveal gross differences among students, schools, and perhaps even teachers, but they are less helpful in revealing more granular differences. Test scores can be a factor in evaluating teachers, but I don’t see how they can reveal the more subtle distinctions that can be so helpful in performance evaluations.
Why not solicit feedback from the parents and students who have the most daily interactions with the teacher when he/she is working — and who have the most at stake in having effective teachers in the classroom? Before people start screaming “OMG! You want the inmates to start running the asylum?!?” (an actual comment I got while serving on the board of a private school), please understand that parent/student input need only be one component in a teacher’s evaluation, perhaps even reducing the weighting of test scores. We would not want the situation that has emerged in some higher ed institutions, where professor evaluations and even course availability is being over-influenced by rankings from students. Teaching should not be a popularity contest. And for those who fear that parent/student feedback will become a “whine-fest”, I can confirm that, yes, you will get whining and unreasonable complaints. That goes with the territory in any user/customer feedback exercise. However, it is usually not all that hard to separate the personal, unfounded, or unreasonable complaints from the useful insights.
Most of us recognize that the teaching profession presents unique evaluation challenges: the teacher’s work is done mostly behind closed doors, with no colleagues or supervisor present; the “supervisor” (the principal) often has far more professional people to supervise directly than is common in other organizations; and finally, of course, teaching is a complex, difficult, and demanding profession. Given the wide variation in student age groups, subject matter, and individual teaching styles, it is difficult to come up with consistent, usable performance metrics. Fortunately, parents and students (beyond a certain age) can offer useful feedback on day-to-day factors that no one else is in a position to provide. I have seen many cases where this feedback helped a teacher become more effective in the classroom, and helped administrators discover gaps in training and support that were relatively simple to fix. And in those few cases where a teacher had serious but less evident performance problems, it helped everyone (colleagues, administrators, and parents) see that “staying under the radar” was no longer an option.
Report this comment for abusive language, hate speech and profanity
Of course students’ and parents’ input on teachers should be heard, but since the tendency in the current education-reform climate is to attach high stakes to everything — especially if it can harm teachers as badly as possible — wouldn’t placing more emphasis on (aka attaching high stakes to) parents’ and students’ feedback be guaranteed to eliminate honest grading entirely? I see magical thinking in the comment that ” it is usually not all that hard to separate the personal, unfounded, or unreasonable complaints from the useful insights.”
I think it’s quite understandable why the suggestion prompted such a reaction in a private school — and I’m speaking as a parent who definitely DOES want my views about teachers heard.
We’d certainly see yet another “it’s a miracle!” as all students’ grade point averages rose to 3.0 or higher.
Report this comment for abusive language, hate speech and profanity
John – can you back this up?
“At the same time, test results are an objective measure and can be a good diagnostic tool, telling a teacher which types of students are learning the fastest, and which teachers in a school are having the most success.”
I don’t think we all agree the tests are objective – their biases simply originate further away from the classroom. Can you back up the claim that the tests have diagnostic value? Can you specify what you mean by “success”? Because I think what you mean is that the tests show which teacher’s students do best on the tests, and I do not accept test results as an term equivalent to “success”.
Furthermore, I have mentioned several times to you offline, and on this blog, and on my blog, that the position of APA, AERA, and NCME is that tests validated for one purpose should not be held valid for other purposes, especially where high-stakes decisions come into play. No one ever seems to have an answer for that – they just go ahead and do it. When will you, or any reporter, ask TAP, or LAUSD, or even NEA how they get around these words of caution from the leading professional bodies in educational research and measurement?
(Just to clarify, NEA as a whole hasn’t adopted that recommendation, and the language of the recommendation seems to describe tests that haven’t been developed yet – perhaps looking ahead to the supposedly improved instruments coming with the Common Core adoption).
I’m all for more substantive, ongoing rigorous teacher evaluations, and would recommend anyone interested in teacher perspectives on improved evaluations should also look at a report I helped produce:
http://accomplishedcaliforniateachers.wordpress.com/act-publications/
Report this comment for abusive language, hate speech and profanity
David: Not all of the recent evaluations on value-added metrics pan it. This from an April report of the Brown Center on Education at the Brookings Institution. Susanna Loeb of Stanford was a contributor. “We have previously issued a report that describes some of the imperfections in value-added measures while documenting that: a) they provide one of the best presently available signals of the future ability of teachers to raise student test scores; b) the technical issues surrounding the use of value-added measures arise in one form or another with respect to any evaluation of complex human behavior; and c) value-added measures are on par with performance measures in other fields in terms of their predictive validity.” In other words, there’s subjectivity in every form of evaluation, including classroom and peer evaluations. So test scores, whether CSTs or locally developed assessments, should be one of several measures. My gut says counting test results as 30 percent of an evaluation, which LA is proposing and the Obama administration has pushed through Race to the Top, is too high. I look forward to the new assessments under Common Core, probing deeper levels of learning, as an improvement to the CSTs. I sense you want to do away with the CSTs altogether. I strongly disagree.
As for the using CSTs as a diagnostic tool, I refer to a presentation by LAUSD at the Ed Trust-West. The district hopes to use value-added results, now known as Academic Growth over Time to answer these questions:
*Are students in a particular region, school, or grade level growing faster than students just like them throughout the District?
*Are specific groups of students in particular schools or classrooms growing faster or slower than the district average?
*And with further observation, what instructional methods, programs and interventions are working to improve student outcomes?
*What is the distribution of effective educators? Do we have the most effective educators working in the right places to achieve our goals?
*What can we learn from places where we are achieving remarkable results?
This is not a punitive approach. The results can inform teachers and the district.
Report this comment for abusive language, hate speech and profanity
John, you have created a brilliant summary of a challenging and fast-changing topic. I have added a link to this post to the relevant section of Ed100.org. More thoughts about the role of students in the evaluation of teachers can be found at http://ed100.org/evaluation. A summary of options for teacher pay including (but not limited to) performance-related pay options can be found at http://ed100.org/teacherpay.
Report this comment for abusive language, hate speech and profanity
This is a sensible suggestion, and such surveys are common among the chartered schools in Alberta, Canada, whose system was unfortunately overlooked in last week’s report “Standing on the Shoulders of Giants” by Marc Tucker and the National Center on Education and the Economy, which summarized best practices among the world’s top-performing educational jurisdictions. Surveys of students, other teachers, and parents should be among the evidence that appraisers refer to in assessing the professional performances of teachers; trained appraisers should be able to easily distinguish sincere, thoughtful comments from gratuitous slander or flattery, and without professionally trained appraisers, any teacher evaluation system is likely to go awry. And Jim is also right in his reticence with regard to possible over-reliance on scores on substandard tests, which all three personnel systems described in this article appear to be guilty of, since they want to base 30-50% of the appraisal upon those scores.
Report this comment for abusive language, hate speech and profanity
My comment was written as a reply to Jim Mills, so the “this” refers to the feedback system he is advocating; I did not realize that Caroline’s comment would remain interposed.
Report this comment for abusive language, hate speech and profanity
The first “evaluation” that any teacher undergoes is in the initial hiring process. My impression, gained from the schools I have been involved with as well as from candidates I have known who have gone through the process, is that hiring in education is too often subjective, non-structured, not that rigorous, and lacking focus on the key attributes that typically characterize successful teaching. As many hiring managers have learned, if you can improve the quality of your initial hires, subsequent evaluations will be less likely to dwell on disruptive “termination-type” issues and more likely to focus on development factors that can help people improve their performance. In education, better hiring should result in a better “fit” between candidates and the schools where they work, improving teacher retention and learning outcomes for students. The current issue of the Harvard Education Letter has a very interesting article on how some charter schools are using new predictive tools to improve their initial hiring decisions. It would be good if the “first evaluation” processes that schools employ on initial hiring were a part of the overall discussion of teacher evaluations.
Report this comment for abusive language, hate speech and profanity
Very interesting, Jim. I believe Green Dot Public Schools, for whom I used to work, used similar technology when reconstituting Locke High School, where I used to teach. The results may be best investigated in Alexander Russo’s Stray Dogs, Saints, and Saviors: the students regularly opined that the new faculty cared about them more and were better teachers, but Green Dot failed to recruit a balanced faculty mixing veterans with youth, and this ultimately made the lives of the new teachers going through the experience very hard. (I have an op-ed on this topic that I am currently circulating privately, in hopes of getting it published in a major media outlet.)
If this genuinely does lead to better hiring than traditional interviews, then district HR departments may actually acquire a reason for their existence. In traditional districts, they are basically paper-pushers.
Report this comment for abusive language, hate speech and profanity
Educators 4 Excellence, a group of newer teachers in NYC schools, has just outlined the elements of an evaluation system that they would like to be used to replace seniority-based layoffs in that district. This article in GothamSchools.org has some details. Interestingly, their system suggests that a modest proportion of the evaluation consist of student surveys (and — gasp! – even test results). Their proposal simply echoes what my son’s terrific high school principal has told me: “The students already know who the good teachers are, and they can tell you a lot about what the bad teachers are doing wrong.” He’s not the only prinicpal who has told me that. Those perspectives shouldn’t drive the evaluation process (which currently scarcely exists in most schools), but why should we completely ignore such valuable information from the people who are right there with the teacher every day?
Also, I have a higher opinion of teacher ethics than some commenters on this blog. I do not think that the introduction of a modest amount of feedback from students or parents would suddenly turn most teachers into “everybody-gets-an-A” popularity contestants. The fact is that any teacher at almost any school who is committed to realistic grading already has his/her share of headaches from students, parents, and even administrators who object to specific grades. It’s never easy having a backbone: the ones who have one now will keep it under any reasonable evaluation system; the ones who don’t have one are probably already riding the “no-hassle A train” to Nowhere.
Report this comment for abusive language, hate speech and profanity
Jim, I think the Educators 4 Excellence proposal is clearly a step in the right direction; nonetheless three comments come to mind: (1) I don’t much like having a one-size-fits-all appraisal system imposed on schools by a state legislature, and would opt out through a charter whenever possible; (2) I think these kind of formulaic rubrics, with so many points allotted to each category, fall afoul of the fallacy of composition, while by contrast I advocate holistic appraisal, with the appraiser’s score being based on the totality of the evidence available (John Wilkes Booth might have gotten a fine score for his manners at the theatre that night if it hadn’t been for that one thing . . . ); (3) I think the “master outside observer” proposal a step in the right direction towards the establishment of school inspectors, a tradition long established in certain European countries though unknown here. France has perhaps the most formally established of such systems, which I would like to adapt and will budget for in our school’s charter.
Report this comment for abusive language, hate speech and profanity
Bruce, I completely agree that a one-size-fits-all approach would not be good. What I liked about the E4E proposal, was that it seemed to be a bottoms-up, constructive attempt by teachers to provide input into the system that NYC chooses. Whatever system that district decides to use may not be right for another district (and it may not even be right for all of the schools in that enormous district, which is one reason I don’t like the idea of “school districts” generally). What seems to be happening, with proposals like this and the recent encouraging statement from the NEA, is that leaders in the teaching profession are starting to take a more proactive role in establishing quality benchmarks that could actually be implemented in real schools. I’m not taking too keenly to your idea of master teacher inspectors, though. The French system has been quite successful in establishing remarkable consistency across the country (as the French have done in so many areas), but it would be surprising to see a large cross-section of American educators embracing that kind of uniformity. I’d be delighted to see it tried in some kind of Choice environment, however, as you are suggesting for your charter. I just shudder to think how it might be implemented in my own district, where the teachers and students would be trapped with it, regardless of how the central office chose to (mis)apply it.
Report this comment for abusive language, hate speech and profanity
John – good seeing you this morning, and thanks for taking the time to compose a detailed and thoughtful reply. I have to say, I was not swayed – at all. My response runs a bit long, so I hope you, and anyone else interested, will swing by InterACT to see what I said – or at least skim. In short, I think you cited evidence that scores are related to scores, and suggested that the use of tests is an argument in favor of using them.
http://wp.me/pPltP-lk
Report this comment for abusive language, hate speech and profanity
David: I appreciate your charitable comments on our continuing discussion and encourage readers to read the response – http://bit.ly/iRnBIY – on your blog (and to bookmark the site as well). Though we disagree on the use of student test scores as one factor in evaluations – let’s broaden that to say measures of student performance, you have been in the forefront of this issue had have contributed important work through Accomplished California Teachers.
I know it’s the end of the year and exhausted teachers have other things on their minds than infighting in Sacramento, but I wish more of them would attend events like Sen. Joe Simitian’s legislative update, as you did. It’s useful to teachers not only to hear the perspective of one of the most thoughtful and smart lawmakers in the state but also the questions and comments from school board members and parents on issues critical to teachers.
Report this comment for abusive language, hate speech and profanity
Yes, we can agree that student performance should be part of teacher evaluation. As a parent, when I see my sons’ CST practice tests, I can immediately recognize how much of the content they have reinforced at home every day, and yet how narrow a slice of “achievement” those tests are. I would fight relentlessly against efforts to evaluate my teaching based on test scores, and you know the community in which I teach. Most of my students excel on these tests. I have nothing to fear – except for a horrendously counterproductive step backwards. On the other hand, I would love to have much more frequent and substantive evaluation of my teaching, and my students’ work, to help me improve in every facet of my teaching, not just the tiny bits that are (supposedly) tested.
Report this comment for abusive language, hate speech and profanity
Jim, your comments are remarkably consistent with the choice philosophy that you espouse and I support. I am glad that you are suspicious, as am I, with school districts in general. While I suppose they are inevitable and appropriate in small rural counties and reserve judgement on their utility for the primary school years, I think it’s clear that by the upper secondary years, families should have a broad choice of good educational options available to them, and think that a better role for school districts, if they are to exist at all in those years, would be to act as school authorizers rather than school operators. In keeping with this, the ideal I imagine would be to gain a state charter and then buy back from local districts, or other agencies, such of their services as make sense and are competitively priced for our schools. I also second your comments supporting teachers taking their destiny into their own hands by proposing measures to improve the quality of the workforce and the educations we provide, rather than being compulsively fixated on salaries and benefits to the exclusion of everything else.
Report this comment for abusive language, hate speech and profanity