Monday, October 30, 2006

Searle vs The Minds I

The Myth of The Computer

The Myth of the Computer: An Exchange

The exchange between Searle and Dennett and Hofstadter in the New York Review of Books gets personal! I believe I found that quote about Searle Professor Chopra was searhing for, though it was Dennett not Hofstadter who wrote it.

The quote is a response to Searle saying, in regards to the Chinese Room argument:
The mental gymnastics that partisans of strong AI have performed in their attempts to refute this rather simple argument are truly extraordinary.

Dennett responds:
Here we have the spectacle of an eminent philosopher going around the country trotting out a "rather simple argument" and then marveling at the obtuseness of his audiences, who keep trying to show him what's wrong with it. He apparently cannot bring himself to contemplate the possibility that he might be missing a point of two, or underestimating the opposition. As he notes in his review, no less than twenty-seven rather eminent people responded to his article when it first appeared in Behavioral and Brain Sciences, but since he repeats its claims almost verbatim in the review, it seems that the only lesson he has learned from the response was that there are several dozen fools in the world.
Damn!

In Searle's review of The Minds I, he claims that Hofstadter and Dennett fabricate a quote which "runs dead opposite" to what he was trying to convey in his paper on the Chinese Room argument.

The quote? Instead of quoting Searle as having said "bits of paper" they misquote: "a few slips of paper." I have to agree with Dennett that the misquote hardly constitutes a fabrication that conveys the opposite opinion that Searle holds. In his response, Dennett apologizes for the misquote, and promises that the mistake will be corrected in future editions of the book. I can verify this: my copy, which dates from November 1982, correctly quotes Searle.

The Systems Reply

Take a Turing Machine that takes as input a story in Chinese, followed by a set of questions in Chinese. It is constructed to produce, as output, answers to the questions, in Chinese. The answers are meant to be coherent responses to the questions. The Turing Machine is so well constructed that a native Chinese speaker will believe the responses come from a Chinese speaker.

We might think the Turing Machine understands Chinese.

Take a man, who doesn't speak any Chinese, sitting in a room. He's given a story in Chinese as input, a set of questions in Chinese, and a set of instructions on how to produce, as output, answers in Chinese. The answers are meant to be coherent responses to the questions. The man in the room performs the instructions so well that a native Chinese speaker will believe the responses come from a Chinese speaker.

The man sitting in the room does not understand Chinese.

The Turing Machine is a symbol manipulator. So is the man in the Chinese Room. So, the argument goes: if you accept that the man doesn't understand Chinese, then you must accept that the Turing Machine also does not understand Chinese.

The Systems Reply: while the man in the room does not understand any Chinese, the system taken as a whole does. The system consists of the man and his instructions, presumably in the form of "bits of paper," and perhaps a pen or pencil.

Searle says that arguing that the "conjunction" of the man with the "bits of paper" is capable of understanding, where the man alone is not, is absurd; he feels embarrassed to formulate a response to it at all.

The Turing Machine described above had its instructions - the rules by which it changed from one state to the next - built into its hardware. In the case of this Turing Machine, the Turing Machine itself was the system. The system consisted of: the tape head, and strip of paper on which it marked symbols. We wouldn't say that the tape head or the strip of paper is what understood Chinese, but the Turing Machine taken as a whole system that does.

The system in the Systems Reply, on the other hand, consisted of the man (and all of his constituent parts) and some "bits of paper." So it does seem different than the Turing Machine: it's easier to think of the TM as a single system that can be said to understand than it is to think of the man along with some "bits of paper" and a pencil as one. The TM, after all, is a single machine.

Let's tighten the analogy then. Take a Universal Turing Machine. As input it can take a description of any other Turing Machine, and run it. [Dennett and Hofstadter call this emulation to distinguish it from simulation. Emulation is exact since we keep a description of the emulated machine down to its lowest level states, unlike simulation which is an approximation which attempts to mimic the behavior of that which is simulated.] We feed as input to the Universal Turing Machine an encoding of the Turing Machine described above - the one that can produce output in Chinese so well that Chinese speakers will believe the output was written by a Chinese speaker. The rest of the story is the same: we the feed the UTM some text in Chinese, followed by some questions, and it produces as output Chinese text.

We might think that the Universal Turing Machine, taken together with the encoding of the Turing Machine that can produce Chinese output, understands Chinese.

The idea that the physical machine, the Universal Turing Machine, taken with some "bits of paper" as input (the encoding of the TM that produces Chinese output) does not seem absurd.

An the analogy is tighter: the UTM is analogous to the man sitting in the room; the "bits of paper" are analogous to the encoded Turing Machine.

Dennett says that Searle's argument is a mere "intuition pump" - there is no argument, only an attempt to play on our intuitions. Most people's first intuition is that taking the man with the paper as a system that is capable of understanding is absurd.

I claim that those same people would not find the idea of the Universal Turing Machine along with the encoded Turing Machine that produces Chinese output, as a system that can understand Chinese absurd. It's perfectly intuitive.

If you accept that the UTM taken together with the encoded TM can understand Chinese, you have to accept that the man taken with the bits of paper does too.

Wednesday, October 25, 2006

Micro-Intelligence

I think that a popular bit of self-delusion that infects those who are interested in building a robot with a capacity for human intelligence is in the conception of what they envision the robot doing. Perhaps this is mostly a complaint against what Haugeland termed GOFAI, but it seems to me that following Turing’s example of giving the machine a specific goal to accomplish, many experimenters think to themselves, “what sort of thing do humans do that we characterize as an expression of intelligence?” and then set out to build a machine that does just that. For example: humans play chess and checkers, and that is a sign of intelligence; humans recognize objects and can form opinions on the properties of those objects, so that is a sign of intelligence; and even mundane tasks, such as navigating the hallways of a building to deliver a piping hot cup of coffee is deemed an activity for the intelligent, so a machine that can do that must in some way be intelligent.

The problem with building a machine that can “recognize” objects, have “conversations”, or play chess or checkers is that that is the totality of its behavioral range; its whole environment, meaning the objects it can “sense” and the dimensions across which it can exert any influence are contained wholly within a small scene of blocks, a small vocabulary (with little to no true understanding, to boot), or an eight-by-eight board with a variety of at most six different types of pieces. There is some sense that once one builds enough micro-worlds in which machines are able to operate—namely, one micro-world for visual recognition of objects (SHRDLU), one for language processing (ELIZA), and others for manipulation of pieces (Samuel’s program) one can plug them all into each other and have a machine that displays a wider range of behaviors. If the complaint is that it has a set of primitives too small, making it unable to assimilate information about complex shapes, complex sentences, or playing pieces never before encountered, then the solution is simply to program in more primitives—curved lines, templates for complex sentence structures, or a battery of more pieces.

This is the point on which I agree with Hubert Dreyfus the most heartily, but also after which we depart and go our separate ways. In regards towards those types of projects, he makes the argument that having once accomplished the construction of a machine that can carry out a set task, it is not remotely close to attaining any sort of human intelligence, for it is restricted to that one task alone and no other. Furthermore, the micro-worlds are simplified versions of environments and activities that we humans encounter and execute, but they are so stripped down as to be meaningless in the context of the real world. Dreyfus notes, “The nongeneralizable character of the programs so far discussed makes them engineering feats, not steps toward generally intelligent systems, and they are, therefore, not at all promising as contributions to psychology.”

Furthermore, plugging in micro-worlds to each other, and then manually adding in more primitives, would be equally pointless. The machine in question would still bear capabilities bounded strictly by its programming, meaning it does not grow or learn in the way a human does. In order to have the type of intelligence that we do, it must be able to form for itself new primitives, with which to compose a world growing ever more complex.

At this point, Dreyfus begins his critique of Minsky’s frameworks. Although I agreed with him regarding the micro-worlds, I strongly disagree with his general assessment of Minsky’s theory. It seems quite apparent to me how frames can lead to the formation of new primitives: our varying types of sensory inputs are stored in collections of nodes, which over time come to correlate with each other, whereupon they form frames consisting bundles of different types of information. For example, a baby encountering an apple many times over will come to correlate the redness, the smoothness, the hardness, and all other properties that can be sensed. With that bundle of information, the baby will then come to recognize other apples by virtue of similarities of their properties. Further properties of apples—such as behavior when falling, or rolling, or colliding with other objects—are assimilated with experience.

In his concluding main thesis, Deryfus makes the claim that the idea of knowledge representation is not only unnecessary but also is impossible to realize in any artificial system. Because an explanation of how we do something always traces back to what we are—which Dreyfus believes is something we can never know—we will never be fully equipped with the conditional rules for founding an intelligent system. “In explaining our actions we must always sooner or later fall back on our everyday practices and simply say ‘this is what we do’ or ‘that’s what it’s like to be a human being.'”

Furthermore, rather than representations, Dreyfus believes that intelligent behavior can be explained under alternative accounts: one, “developing patterns of responses,” with recognition being gradually acquired through training; and two, allowing for “nonformal (concrete) representations” (e.g.: images) that are used in exploration of “what I am, not what I know.”

Frankly, I couldn’t disagree with Dreyfus more. Based on the fact that we collect information with our sensory apparatuses, and then store it for later use, it must necessarily be represented in some manner—at all times. How else can we manipulate the information? Dreyfus might say that we should appeal to those concrete representations that don’t require any explanation of the rules for symbol manipulation, since there are no symbols in concrete representations. But how does this explain anything? What have we learned about ourselves from this? How does it allow us to explain our behavior and help predict future behavior?

Dreyfus’ “patterns of responses” seems to me to be a behaviorist denial of any internal life. For him, there seems to be no concept of swimming that we hold, only our acquired responses to being in the water. All we do in life is respond to stimuli.

Dreyfus seems wrapped up in the idea that to have intelligence is to have only human intelligence and therefore since no computer can be human, it cannot have intelligence. I would argue that there must be an entire range of ways to be intelligent, perhaps even some that don't use representation, as our intelligence as a species did not arise in a day or ten-thousand years, but evolved over millions of years (and how many hundreds of millions of years ago was it when the most primitive nervous systems emerged?). Clearly, then, there are some structures that evolved first and others that rely on those original formations; I take this as evidence that (a) cognitive science should begin to concern itself more with the evolution and development of intelligence and that (b) more experimental research should be done that does not try to go gung-ho and recreate a feature of human intelligence, but rather should attempt to recreate the intelligence of lesser species.

Tuesday, October 17, 2006

Avatars of the Tortoise

In "Avatars of the Tortoise," Borges traces Zeno's paradox throughout the history of philosophy. Zeno argued that movement was impossible. Aristotle explains the paradox as follows: "In a race, the quickest runner can never overtake the slowest, since the pursuer must first reach the point whence the pursued started, so that the slower must always hold a lead."

Borges reflects that the applications of the paradox are inexhaustable. He says, "the vertiginous regressus in infinitum is perhaps applicable to all subjects." One such application is, "the problem of knowledge: cognition is recognition, but it is necessary to have known in order to recognize, but cognition is recognition..."

He finds in Sextus Empiricus an argument for the uselessness of definitions, "since one will have to define each of the words used and then define the definition."

[[ I'll expand on the following at a later time. Right now I'm just sketching out an idea ]]

I'm reminded of Dreyfus and wonder if we find a little of Zeno in his argument. A key objection to Minsky's framework of knowledge is its inability to give a way of translating perception (what Borges called 'recognition') into formal representations (ie terminals in a frame). And the objection to using formal representions, in principle, to create intelligence is reminiscent of Sextus Empiricus's reasoning about the futility of definitions.

Sunday, October 15, 2006

Before posting some much belated responses to Dennett's paper on "True Believers" I want to address some issues about Turing's paper.

One of the objections Turing anticipates is what he calls the "continuity from the nervous system." The idea is that the human nervous is not a discrete state machine, and therefore it is not possible to mimic the behavior of the nervous system with a discrete state machine.

Turing's response to this surprised me. He says that "...if we adhere to the conditions of the imitation game, the interregator will not be able to take advantage of this difference."

Now, keep in mind that the purpose of Turing's paper is to support the claim that the imitation game is a valid substitution for the question "can machines think?" So his response that the imitation game won't distinguish an important difference between humans and discrete state machines suggests a lacking in the imitation game. The question is, which of the following are true:

a) the imitation game cannot make an important distinction and therefore is not a sufficient test, or
b) the differences between a continuous state machine and a discrete state machine are irrelevant when trying to determine if a machine can pass the imitation game, or
c) the difference between a continuous state machine and a discrete state machine are irrelevant when trying to determine if a machine can think
We can take for granted that (b) is true for the moment, after all Turing admits this, but I think what Turing was trying to refute is (a) and (b). If not, it's what he should be attempting to refute. See, the objection about the human nervous system was made by people who didn't even consider the imitation game - so that the fact that the differences between discrete and continuous machines doesn't do anything to effect the outcome of the imitation game does nothing to refute the original argument. It's up to Turing to convince us that even though the imitation game is not affected by the differences between a continuous state machine and a discrete state machine, the imitation game is still a good substition for the question about intelligent machines.

All of this about discrete state machines vs continuous state systems led me to consider some questions Turing did not address.

There is evidence from quantum machanics that the universe is discrete. So that, on some level, the nervous system is itself discrete. Assuming this is true, the discreteness of the universe would only be true "at the atomic or subatomic" level, but the fact remains that we can give a discription of the universe in terms of a discrete state machine. Which, as we know, means that any other discrete state machine is able to simulate it (run it) - given a enough time and memory. So, if it were proven that that the universe is discrete, we would know, with certainty, that a computer could simulate intelligent systems, because intelligent systems exist in the universe.

Of course this would only be interesting from a theoretical level, since the amount of memory and speed needed would make the problem of actually simulating the universe (or any physical description at such an atmic level) intractable. Also, we would have a situation in which the intelligent systems exist on the machine - but the machine itself (taken as a whole) is not an intelligent system any more than the universe itself is. The machine would be simulating the entire universe, in which intelligent systems are only a fraction of what it's simulating; the intelligent systems might be thought of as virtual machines that our real machine is simulating.

Once we know that it would be theorectically possible for intelligence systems to exist on discrete state machines in this way (which is up to quantum machanics to prove) we'd know that it *can* be done, perhaps without the need of simulating everything in the universe.

My other question is related: can a discrete state machine simulate any continuous system? I'm not sure if it theoretically true that one can, but it seems that given enough precision a discrete state machine can become accurate enough for the fractional differences between it and the continuous system it is simulating to be irrelevant.

In fact, the extent to which the discrete state machine is sucessful in its simulation is probably determined by the degree of accuracy being demanded. It can always be better, since it can always be more precise - infinitely more precise, since we're attempting to simulate a continuous system. (Are discrete state machines by definition finite? I don't think so, since the theoretical Universal Turing Machine is infinitely large, and certainly discrete. But, on the other hand, we can never build an actual Universal Turing Machine).

If it is true that the universe is actually discrete, not continuous, but that we think of the human nervous system as a continuous system, then it follows that a discrete system (the quantum universe) is simulating a continuous system (the human nervous system). Simulation probably isn't the right word here...but it's meant to suggest that if the universe can go from discrete to seemingly continuous, then we can do it too.

Friday, October 13, 2006

[Note: I'm going to clean this up, because I've decided it's not very readable.]

My Roommate has iTunes.

I can get into his iTunes from my computer, but only if we’re on the same network. If I’m outside of the network—say, at work in the library, trying to connect to his computer—it will not allow me to access his songs. This way, Apple doesn’t have to worry about crossing any copyright boundaries. However, if I SSH into my home computer, I can then control that it and have it connect to my roommate’s iTunes.

It’s like being at the door of a club, where on the outside its dark and frigid, on the inside it’s hot and noisy, and the only access you have to it is mediated by the bouncer who opens a slit in the door.

Returning my thoughts to mind design…

I can think of a node in a network being activated, and as part of its activation, it is given a signal to activate all connected nodes to a depth of n, and it then activates all of its adjacent nodes to activate them, sending with all activations a signal of n - 1. Each node in this cluster of the network behaves the same, so the signal travels down to a level n from the origin.

Furthermore, each node is being given a request sent to it from the “higher” node—with information that acts as a certain criteria. If the information contained in the node meets the criteria, then it returns a tidy package of information appended to a list of information received from all the connected nodes to a depth of n.

The higher nodes, thus, have no direct access to nodes lower than one level down; the only access they do have is to one node down, one up, and other nodes at the same level. A node moves up the hierarchy by gaining more sub-nodes.

When I connect to my computer at home from work, I’m interfacing directly with a node below me, but the only way to retrieve information from sub-nodes of the computer is to use it to mediate the information. I am also a node, where information is requested from me, with a certain criteria, and also signaling to me how far I should look. It is as though the computer performs a certain function for me, which is to respond with certain information; being myself a node, I don’t become concerned with how the computer gives me the information that it does, only that it does.

I can imagine a network of nodes based off of an image. Nodes are generated serially by streaming through the pixels one at a time, saving the values of the pixels as information stored in the nodes. Nodes are clustered together first by their proximity to each as seen in the image; they then form a hierarchy based on how many nodes share the same color, with the nodes in the largest clusters of same colored nodes rising to the top. Nodes that are isolated, meaning no nearby nodes share similar colors, become relatively low level.

This can be accomplished by having the nodes, once they establish links to their neighbors, send out a signal to infinite depth, with the criteria that the receiving node needs to be the same color. Thus a node will assign itself a ranking based on how many nodes respond to it. Clusters of nodes that are the same color and continuosly connected to each other are then assigned higher levels by being unique among other clusters found in the given image, either by having a certain color found rarely elsewhere in the image, or by being the most luminous, etc.

Therefore you can have rough representation of an image using nodes in a network.

The whole network that corresponds to an image can then be thought of as one of Minsky's frames, with objects found in the image as terminals, represented as high level clusters of nodes (“capitals”) that share common features (color/luminosity). Of course these terminals will connect to others outside of the image (perhaps another image, or a linguistic frame), as this is a highly interconnected society of mind.

Tuesday, October 10, 2006

Newel and Simon restate and old definition in a different light—intelligence is finding a solution to a problem, but specifically by traversing a search space. This seems sensible; any behavior we exhibit can be encoded as a set of actions. Learning might be described as acquiring new solution sets to problems. Theorizing might be described as producing partial solution sets. Producing a search space involves establishing what in the environment is mutable and what is immutable, and how things mutable can change.

In the example of finding a solution to ax + b = cx + d a (good) solution generator would need to first identify what in the environment (this being a brow-raisingly small environment) is mutable and how. But first it must identify the rules of the environment, namely that anything done to one side must be done to the other; then the specific changes can be identified—ax, b, cx, and d can all be added, subtracted, multiplied, etc. From there, a generator can produce solutions.

So what would happen if we considered a more robust problem? Let’s consider the problem faced by a chimpanzee trapped in a room with a hole in the ceiling and some sturdy boxes in the corner. The obvious (to us and perhaps the chimp) problem is how to get out of the room? First the chimp would have to identify what in its environment is mutable—namely, the boxes and itself. Of course, it might also have to actually go over to the boxes and determine physically if the boxes are light enough to move and strong to bear the weight. But anyway, once the variables are all determined, the generation of solutions may commence. The difficult question is, then, why is it so painstakingly obvious to us that the clear solution is to place the box beneath hole?

The answer to that, according to Newell and Simon, is a heuristic. And according their main thesis, intelligent behavior would be to follow some heuristic that produced a solution with the least searching. But what heuristic is it that we follow when figuring out how to get out of the room? More importantly, how does a machine develop such a heuristic on its own?