Coherent Extrapolated Volition
Today we finally get around to discussing the infamous ‘Coherent Extrapolated Volition’ idea, the one that Eliezer insisted I was ignoring.
As of May 2004, my take on Friendliness is that the initial dynamic should implement the coherent extrapolated volition of humankind.
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
Fascinating stuff. Where is this concept discussed in non-poetic terms? Where is it operationalized? There’s a whole long section in the original paper discussing the idea. Surely there must be some rigorous defining in there somewhere.
Locating it is left as an exercise for the reader, if you enjoy wild goose chases.
Let’s take a closer look at that name. ‘Coherent’ seems straightforward enough – in the context of abstract ideas, it means something along the lines of ‘logically connected; consistent’ or ‘having a natural or due arrangement of parts; harmonious’. Do humans, either individually or collectively, possess a coherent set of preferences? I have no idea how we could determine that. Judging from his writings, neither does Eliezer – but that’s not an obstacle. Eliezer doesn’t even bother to handwave the problem away – he simply ignores it.
‘Extrapolated’ is easy to grasp, as well. It has a clear definition: “to infer an unknown from something that is known”; in mathematics, it’s even clearer: “to estimate a function that is known over a range of values of its independent variable to values outside the known range”. Obviously human morality varies in a predictable way with the independent variable of time. Once, the concept of slavery was widely accepted, then the idea that it was intolerable spread itself, and now it’s abolished and we consider slavery repugnant. Clearly that’s progress because the unacceptable moralities past slowly developed into a form compatible the moralities of today. That form: the moralities of today.
Humanity’s moral memes have improved greatly over the last few thousand years. How much farther do we have left to go?
They have? I wasn’t aware that moral codes have improved greatly. How, precisely, do we evaluate a moral code other than by appealing to the one we’ve already accepted and internalized? Certainly there are standards of logic and self-consistency that don’t require us to make moral judgments, but my intuition tells me that’s not what Yudkowsky is talking about.
As for “how far we have left to go”, I’m not sure what to make of it. It’s as though he believes there’s some ultimate goal or teleological standard towards which humanity is collectively approaching. Peculiar, since human societies show every sign of adapting themselves to according to the conditions in which they exist, and evolutionary principles are notoriously not goal-driven.
By the standards of today, the standards of today are an improvement over the past. They’d have to be, because moral standards are necessarily applicable to standards in the abstract or general sense, and every system of standards that is self-consistent will view itself as a perfect match to its criteria for the ideal system of standards.
It is undeniable that moral conventions have changed, ‘progressed’ in the weak sense, to become what they are today. It is not clear that this change is orderly or predictable. Nor is it clear that today’s conventions are fundamentally superior to those of the past in any non-trivial sense, much less that they represent objective progress. There is a deplorable tendency of human beings to regard their present beliefs and positions, no matter how arbitrary, as being the end stage of a tremendous process of improvement. Man measures all things, and he tends to hold himself up as the standard of correctness. Unfortunately, even a little thought is sufficient to convince that we are in no sense an end stage, and that considering our prejudices and tendencies to be ultimate and defining truths is nothing more than pernicious narcissism.
You cannot extrapolate a random walk. Can we extrapolate moral change? I know of no proof that such a thing is possible, much less attainable for us. Does Eliezer offer that proof, or even a credible argument in favor of that possibility? Or does he merely take it for granted as an unquestioned assumption?
Where are the questions he claims I am bringing up as my own? Does he ask them? Or does he simply assert the answers he wishes to find as self-evident truths? To ask a question, you must not believe you already know the answer; if you have a belief, it must be set aside for the asking to occur. Does Eliezer set aside his beliefs so that he can ask honestly?
Read Yudkowsky’s writings. Take the time to look up his later writings if you wish, although I am not aware of any fundamental change in their content. You will have to judge for yourself, but I think you can guess the answer I found when I looked for one.