The whale in the room (second in a series of articles on problems in education)

Throughout various phases of history, there have been progressive movements that have tried to uplift society in various ways.  In the United States (and I know we are hardly unique), one of those long ago progressive movements added a new component to education … something that had not been there in force before, and something that irrevocably changed education.

For the first time, education as a whole was subject to measurement.

Certainly assessment (that is the new eduspeak preferred term for “testing”) has been with us for centuries.  That in and of itself was not new.  However, the extent of trying to measure educational success and progress began as never before.  Today it has evolved into entire new levels of being that have great effects on how teachers and students (and parents operate.  Here are a couple of examples:

*Rubrics.  For decades, when teachers assessed students subjectively, they kind of graded papers any way they wanted.  At best, it could lead to a degree of inconsistency, and at worst an unscrupulous teacher could grade certain students harder than others.  Rubrics were the solution for this:  a tool that forced objective grading of subjective assignments.

*Objective testing (multiple choice testing).  Multiple choice testing is a relatively recent idea in assessment.  Certainly, in the distant past, all testing was oral.  Later, all testing involved writing.  As schooling became more and more mandatory and teachers had more and more students to deal with, there needed to be a way to assess so many students in a relatively short time.  The nice part about multiple choice testing is that the test removed subjective decision making on the teacher’s part.  The bad news is that the lack of subjectivity relied heavily on how well the teacher was in crafting the questions.  Oh, and it is generally far more difficult to write multiple choice questions that  test higher level thinking (which prior to the 1980s/90s wasn’t such a big deal since very few high schools were doing anything that actually required higher level thinking).

*Norm based assessments:  These are your standardized tests that you have read about in the newspapers.  The idea here was that a single test would be administered to an experimental group to determine where students of various abilities should score, then the test can be released to test other students, who can then be properly measured to various benchmarks established by the creator of the test.  With its strengths, there come various drawbacks.  For one, it is very expensive.  For another, there is not a great deal of information released from the various companies regarding who is used to set the standards.  In all my years, I have never heard of any students anywhere being used as test takers to establish the standards for these tests.  Do they use college students?  Do they use only kids in Iowa (keep in mind that a non-zero number of kids in Iowa grew up and thought Rick Santorum was going to save the country … so I’m not sure they are the best choice for establishing an intellectual baseline for the entire country).

*International Testing: Because researchers will want to know how different nations compare to one another.   This is only a few decades old, but has become the virtual cornerstone for the “American education system” bad mantra that many are screaming all the time.  How are these tests given, and who is actually tested?

But I would like to put forward a conjecture that doesn’t come up much in education classes.

How do we know that education can be measured?  How do we know that all of this measurement actually does what it purports to do?  If it can be measured, how sure are we that it can be measured to the extent that we claim?  How sure are we that our measuring instruments (“tests”) are measuring what we purport them to measure?

These are critical questions given that so much of the current reforms and trends in modern education rest on the ability to gather data and use that data to inform instructors on how to proceed with individual students, and for administrators to decide which teachers are kept and which one are not.

My background is primarily in science, so I have a decent understanding of the concept of taking measurements and then analyzing those measurements to look for patterns.  I also know that, especially in an age where science is constantly under attack, that there is a compelling reason why we do science the way we do.  To put it simply, science works.  Science uses data and analysis to essentially predict the future, but unlike mystics who happen to be right about 50% of the time, science needs to be nearly 100% predictive, otherwise the model used to predict the future is thrown out and a new one is sought.  Imagine if science weren’t that close to perfection … that is, the data was not good about predicting future events.  People would need to seriously look at how science is conducted, and they would be right to question it.

Which takes us to education.

I think education has not been particularly good when it comes to predicting the future.  I would argue that very few human endeavors which claim to be data based are so abysmally poor at being able to help determine a future course of action.  If that is the case, then educational research, must be called on the carpet to explain itself.  Unfortunately, very few people are doing this, and the researchers vehemently defend their practices, despite a complete inability to explain why their plans don’t translate well into practice.

I’ll take it a step further:  in science, we can develop new consumer items (pharmaceuticals, additives for food, etc), but these things are not released for actual public use unless a battery of tests are performed, particularly in the area of “side effects”.  Not only doesn’t this happen in education, but quite a few people worry about side effects, and those questions are then filed under “resistance”.  We have instituted a system in our schools where students must be given the option to make up tests when they are unsatisfied with their results.  A number of teachers predicted that what would start to happen is that students would focus on one or two tests, and would tank the others knowing they could make them up next week when their schedule is more relaxed.  We were told “nonsense”, yet our students have already started to communicate that this is exactly what they are doing (they have no reason to hide this, they are simply taking advantage of a perfectly legal loophole in policy.  This may very well in fact make them better at learning the knowledge/skills being taught to them (I would argue that this simply makes their grade better).  However, what is the side effect?  Are our kids now so conditioned to this environment that being able to deal with academic stress is becoming a thing of the past?  What happens in a college setting?  A professional setting? Or, are we basically programming a lot of our kids to deal with low stress environments (read: colleges that aren’t too selective, and jobs that aren’t too selective either)?  These are some important questions that we (as teachers) know that the researchers have not contemplated.  Yet, this is being turned loose on kids with no real concern for what could happen.  These new ideas have not been subject to significant testing for “side effects”.

Last autumn, I took my first course in educational research.  I’ve certainly read enough if it in my life, but this was my first peak behind the curtains as to what these people at the colleges of education actually do.  It was the single most frustrating class I have taken in my post-undergrad career.  At one point I was sitting down across from the professor, and she noted that I was a science teacher, and that she didn’t understand why I was having so much trouble given that data collection and analysis should be so much a part of what I do.  She was right, but what was not a part of my experience was the (and I will clearly express an opinion here) unscientific manner in which educational data is collected and analyzed.  More than once I looked at that third story window and thought that it was appetizing.  For example, one study was attempting to validate the results of a test, and to do that, student grades were used as validation.  That is, if the test scores lined up with the grades, the test was valid.  I hope it doesn’t take too much intuition to see that this is making an enormous assumption:  that the grades themselves are valid (given the school where the testing was taken place, a low performing inner-city school, I could not see that the grades were valid milestones to measure the test with).  Real science does in fact need a certain degree of assumption, but only after those assumptions are shown to agree with reality are they usable assumptions … in education, the assumptions are made without such determination to see if they agree with reality).

It became clear in taking the course that data collection in education is based on a lot of assumptions that are not well founded assumptions.  For example, data in education is often collected only once.  In science the idea of data collected without repetition is problematic.  In education, the concept of repetition is that “you look at many students”, but given the numerous uncontrollable factors that can go into a student’s performance on a test, the students are not tested multiple times to establish a baseline.  It would be like testing an antibiotic on different strains of bacteria, and after seeing that it worked on two species, declaring that the antibiotic worked without repeating the experiment …. then claiming “I tried it once on 2 different species …. that is enough”.

The concept of control is also lost in educational data collection.

The most blatant aspect of educational research is how uncontrolled it is.  In many scientific experiments, especially those looking at cause and effect relationships, it is critical that you are more or less looking at only two variables, and that anything else which might affect the outcome of the experiment is held constant (controlled).  Educational research discusses control, but in fact often doesn’t come close to accounting for a myriad of factors that can effect the outcome of of their tests.  For example, I read a study where a particular teaching technique was tried, and was compared to classrooms where teachers were strictly lecturing.  The study controlled for student population in terms of gender and age.  Completely ignored:  educational level of the students (students in a particular class are often non-randomly grouped because of another class they are taking … such as when you have a lot of band kids in a particular math class), the years of experience for the teacher, the experience with each technique (was the lecturer a poor lecturer?), the time of day (was one class right after lunch compared to 7:30 am or 2:30 pm?).  These are just a few factors that can seriously affect outcomes, and they are generally ignored in educational research.

When you read a news article that makes claims about just about anything in science, you are generally reading an article about an ongoing bit of research that has not yet gone through peer review, and in that sense, you should have a bit of doubt about certain claims.  When you read about educational research;  research that is generally not subject to peer review, repetition, or even the standards of scientific research … you should know to be very skeptical about what you are reading.  It may be true … but it might not.  Flipping a coin is about as good a way as any to see which it is.

One specific example of the educational research two-step:  a few years ago a piece of research came out showing that students who took Algebra II were much, much more likely to pass standardized tests than students who didn’t.  It did not make a difference the grade:  even failing Algebra II meant higher test scores.  Result:  more than 15 states passed a law requiring students to take Algebra II as a graduation requierment.  Many states require three years of math, but requiring a specific level of math be taken was new:  even if you were severely mentally handicapped, and reading at a second grade level:  you were taking Algebra II.  In 2013, as a part of the same piece of legislation that lowered Texas’ mandatory standardized tests from 15-per-year to 5, Algebra II was finally removed as a graduation requirement for students.  I can’t believe I am saying this, but Texas actually got the message:  There is no reason to make all kids take this course.  Yet a single study was enough to send legislators scrambling to force kids to do this.

As a nation, we need to be very cautious about accepting the results of educational research, and even more cautious about allowing that research to guide public policy.  It is not science, and it does not have the track record of success that the physical sciences have assembled.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: