When confronted in a lecture by a student, David E. Rumelhart, who was teaching a class on cognitive systems, replies rather eloquently that
“our understanding of abstract phenomenon always is based on our experience with the technology of the time.”
He alludes Aristotle and his Wax tablet theory of memory, or Leibniz likening his vision of the universe to clockworks, as examples. Rumelhart concludes that we must draw our analogies from the most advanced technological development of our time- the computer! He further elucidates that computers are far superior to previous technologies due to its capabilities to simulate systems and run operations that we wish to experience.
Rumelhart was a follower of the connectionist approach, an approach to decipher mental operations using computer. But the architecture that connectionists employ is considerably different from the classic von Neumann architecture. The von Neumann architecture can be seen in the personal computers we commonly use. It applies a rule-based approach with defined procedures or algorithms to compute a task such as solving a mathematical equation, which can be broken down into instructions that the CPU can execute. In summary, a system using this architecture can process symbols that obey a set of formal rules. The Connectionist architecture, on the other hand, is based on “neurally inspired” processes that result in a system using “brain-style computation.” It essentially is artificial neural networks.
A crucial aspect of this approach is the utilization of parallel processing. Keeping in mind that our brains operate on a time scale of milliseconds, while a computer performs much faster on a scale of nanoseconds, we have to impose what is called the “100-step program” given by Feldman in 1985. According to this constraint, mental phenomena do not employ above 100 elementary sequential operations. Hence to truly mimic the brain, we need to rely on its strength- the sheer number of units, than its speed; units connected and processing in parallel.
The main features of this model that in Rumelhart’s words represent “A brain metaphor instead of a computer metaphor,” are as follows:
- Simple processing units, a concept similar to an “abstract neuron.”
- Arranged in layers with arbitrary patterns of interconnectedness.
- Recursive rules for updating the weights or strength of connections between the units.
To understand connectionism’s relevance in learning, one must look at cognitive science’s history. The study of representation in psychology and artificial intelligence (AI) and the study of learning has a history of tensions. During the reign of behaviorists, psychologists studied knowledge acquisition by experience, completely ignoring how this knowledge or experiences were represented internally. In the case of AI, while it began by focusing on learning, it soon diverted to examine representations primarily. Cognitive psychology was also influenced by the “computer metaphor,” similar to the deviation of AI. And through all this, learning was receding into the background, with an exception to behavioral psychologists’ studies. When the connectionist approach was initially introduced, it offered the “neural network metaphor” as an alternative to the “computer metaphor.” It brought in a computation method that provided the ground to study the relationship between learning and representation.
The field of learning can be broken down into stimulus(S) and response(R), and whether it is unconditioned (UCS/UCR) or conditioned (CS/CR) using training. Various models try to explain these relationships, the most popular of the lot being the Rescorla-Wagner (RW) Model, given by Kamin, Rescorla, and Wagner. The RW model is a simple one-layer network linking various cues and possible outcomes. Learning occurs when the associative weights of the links after each trial get modified, reducing the error between the network’s prediction of outcome and the actual outcome given as feedback on that trial.
For example, in the following Figure (1), you can observe that when an association between the color yellow and a reward is formed, and once yellow completely and accurately predicts the reward, you can see that the model assigns weight 1.0 to the color yellow. In such a scenario, the color yellow-orange will not elicit the response upon testing because we already have yellow that accurately predicts the reward. Hence stimulus generalization cannot be explained by this model.
Generalization refers to expecting similar outcomes for stimuli similar in physical attributes, co-occurring (sensory pre-conditioning), or resulting in similar outcomes (acquired equivalence). The keyword here is ‘similarity’. A model like RW Model uses discrete-component representations for stimulus representation. Here, each individual stimulus represents one node in the model [In our example, the nodes are in shades of yellow]. This type of representation is apt for situations where the similarity between cues is too small for transfer of response [If each node had distinct colors like yellow, blue, etc.]. But in a case of stimulus generalization, the stimulus could have some property of similarity like physical similarity [Such as the case here in our example], making discrete-component representation a poor model.
At this juncture, what could be employed alternatively is distributed representations where stimuli are represented by overlapping sets of nodes, each node corresponding to a stimulus. Does this type of representation ring any bells? Ask a connectionist.
In the given Figure (2), we can see such a distributed representational network with two layers. Akin to the previous example, associating and training the color yellow with a reward will bring about a response in this network upon testing. But so would yellow-orange or even yellow-green, although not at the same rate. This stimulus generalization is possible because of the additional hidden layer of internal representation of the net. Since yellow connects to three different hidden nodes, which gets activated, at the same time, these three hidden nodes have backward connections to the non-trained colors. Learning consists of changes in specific properties of the hidden units and output units.
Connectionist approach to learning models is unique not only because of its varied representational capabilities like distributed representation or its deviance from rule-based models. While its absorption with the “brain metaphor” is eccentric, what makes this approach one of the most popular trends in current science is its capability for exploring complex interactions between representation and learning.
“what is ‘seen’ is [a] presentation, not a representation (Skinner, 1985, p. 292)”
References:
Rumelhart, D. E., & Todd, P. M. (1993). Learning and connectionist representations. Attention and performance XIV: Synergies in experimental psychology, artificial intelligence, and cognitive neuroscience, 2, 3–30.
Rumelhart, D. E. (1998). The architecture of mind: A connectionist approach. Mind readings, 207–238.
Hanson, S. J., & Burr, D. J. (1990). What connectionist models learn: Learning and representation in connectionist networks. Behavioral and Brain Sciences, 13(03), 471–489.
Connectionism. (n.d). Retrieved from https://www.massey.ac.nz/~wwpapajl/evolution/lect16/lect1600.htm
Gluck, M. A., Mercado, E., & Myers, C. E. (2007). Learning and memory: From brain to behavior. Macmillan Higher Education.