There are very few students, even at a brick-and-mortar university like Cal, who haven’t taken advantage of online education in one way or another. We use Khan Academy, access free lectures, and complete coursework on the Internet, even as we pay for the privilege of being on campus. Usually the usefulness of online education is limited to the computer’s typical strengths, such as mass distribution of information, convenience, grading multiple choice exams, etc. – but this may not be the case for very much longer.
EdX, an online education enterprise founded by Harvard and MIT (of which Cal is also a member), has introduced a free program that is intended to grade students’ essays and provide immediate feedback. As this New York Times article reports, the program uses artificial intelligence in order to ‘learn’ how to grade based on how a human grader scores 100 essays.
There are several obvious reasons for this technology’s appeal. For one thing, it reduces the grading load on professors and GSIs, which has ballooned as class sizes increase. There may be biases and inconsistencies in human graders, and with the pressure to grade so many papers, the amount of feedback and care put into each individual essay inevitably decreases. Despite graders’ best efforts, it takes time to return the essays, sometimes up to a month. By this point, another paper might be assigned, and the student doesn’t have the feedback to improve their work. In contrast, the EdX software proposes a system in which the student enters their essay, receives immediate comments, and then revises and tries again.
Sounds great, right? Professors and GSIs reduce their workload, and students get the feedback they need before it becomes irrelevant. But as a person who writes a lot, academically and otherwise, I have to admit that I’m skeptical. Reading the comments on the Times article, it looks like I’m not the only one.
It’s true that A.I. has advanced considerably, but this isn’t multiple choice or true/false questions we’re talking about here- an essay is a completely different beast. An algorithm may be a good way to weed out obvious grammar errors, but anyone who uses Microsoft Word or Google Translate knows that language is much more complicated than the capabilities of any current A.I. system. For that matter, there’s more to good writing than just grammar.
Maybe grading something like an AP Biology essay question, in which the mention of specific vocabulary and concepts is sufficient to earn points, would be a helpful use of A.I. But what about an eight page paper? Can a computer understand the progression of your argument, the coherence and soundness of your statements, the economy and clarity of your style, your treatment and interpretation of ambiguous information, and the many other factors that contribute to a really solid essay?
I think not. However, I should probably qualify this statement- perhaps not yet. Writing and understanding writing are fundamentally human skills, and using language in an informative, effective, and pleasing way is hard, even for actual people. If a software program could understand an essay well enough to properly grade it, odds are, it could probably write the paper itself. (And it would probably do a better job than a good chunk of the current human population; see below.) Assuming A.I. ever progresses to this level, we will be confronted with many ethical questions more pressing than whether or not a machine can grade an essay.
While these questions are very important, it isn’t my intent to debate them here. The situation before us now is this: given the current emphasis on standardization and efficiency in education, it’s likely that the EdX technology will be utilized to a certain extent in the near future. The existential weirdness of the essay never seen by human eyes, but wholly evaluated by a machine, may not come to pass. However, with the current state of our overcrowded, underfunded education system, it’s inevitable that many turn to technology as the great solution.
Proponents of software grading have good intentions in attempting to fix a real problem. According to a government study, only 27 percent of students in both 8th and 12th grades are deemed proficient in writing. This is a dismal statistic on its own, but does a grade of ‘proficient’ on a standardized exam even indicate the ability to communicate in practical situations? Who knows?
But what we always seem to forget is that the effect of technology does not always depend on its sophistication, but rather how we choose to apply it. Technology is not a panacea; it’s a tool- a powerful one, undoubtedly- but nothing more. If the situation gets desperate enough, I suspect this software will end up as a cost-cutting measure to churn out writers who can be considered ‘good enough’ according to a standardized benchmark, freeing professors for more ‘productive’ endeavors. Such shortcuts never fix the core problems meaningfully, but simply perpetuate them, at best.
Like any other skill, writing requires practice and feedback. However, turning to mechanized, immediate grading won’t automatically create good writers. For one thing, students are adept at gaming the system, if nothing else. Many modify their writing and ideas depending on the biases of their human grader, but figuring out what the algorithm rewards (use of elaborate style, long sentences, buzzwords, etc.) is on another level entirely. When your grader doesn’t actually understand what you’re writing, it’s much easier to fool them. (For a hilariously cynical attempt to bamboozle a computer grader, check out Dr. Les Perelman’s SAT essays and their feedback here).
As Dr. Perelman shows, the computer’s suggestions, which are supposed to be so helpful for students, are worse than meaningless- they are often patently wrong. Maybe this is only because the technology isn’t advanced enough yet, and maybe, as some proponents argue, mechanized grading is better than the rote, inefficient human grading that isn’t much improving the situation either. Maybe this is the course of history, and we are dinosaurs to stand in its way.
But until history gets here, a renewed emphasis on the dinosaurian way to learn writing might not be so bad after all. Here’s what you do: read a lot, write a lot, think critically about what you’re reading and writing, get feedback from people who know what they’re talking about, edit other people’s work mercilessly, edit your own work even more mercilessly, rinse and repeat, rinse and repeat. It’s not a quick fix and it’s not easy, but it works- and it’s worthwhile in the end.
Creation of Artificial Intelligence
Calvin and Hobbes cartoon by Bill Watterson