Connectionism and Second Language Vocabulary.
ROBERT WARING
Abstract.
In Part 1 I will explore the nature of connectionism and point out some
of the ways it seems to account for aspects of second language vocabulary
knowledge at a micro-cognitive level which is not subject to introspection.
From there, I will look at some of the limitations of this view and will
show the connection with higher cognitive functions. In Part 2 a model of
lexical storage will be presented to show this relationship. There will
be some discussion of the limitations of the model and some suggestions
for directions for future research.
Part 1: What is connectionism?
Connectionism as a term was first mentioned in Thorndike's study (1898)
of the way cats learn in incremental stages. Connectionism as a paradigm
of learning has its roots in associationism. Associationism dates from classical
times but was substantially refined by the seventeenth century philosophers
Hobbes and Locke. The fundamental belief of associationism is that learning
could be regarded as the formation of associations between previously unrelated
information based on their contiguity. Connectionism is also based on this
principle but is somewhat different in that it encompasses much more as
outlined below. Connectionism borrows heavily from associationism and is
a term that covers neural networks and Parallel Distributed Processing (PDP).
Neural networks seek to explain cognition in biological or neurological
terms and PDP tries to show that the information is not stored in the brain
in one place but is distributed throughout the various parts of the brain
which serve certain linguistic and non-linguistic functions. Generally PDP
and connectionism are seen as being synonymous. Associationism by contrast,
does not contain many of the more advanced and sophisticated notions of
connectionism (see Bechtel and Abrahamsen (1991) or Cohen et al. (1993)
for reviews in this area).
There is no unified agreement on what exactly connectionism is, however
most connectionist models seem to share several properties. Connectionist
architectures of cognition are loosely based on the architecture of the
brain. Connectionists do not use neurological terms such as synapses and
neurons directly, but instead use the terms nodes and networks which are
said to represent a crude but effective approximation of the neural state
of the brain at a superficial level . These nodes are massively interconnected
with other nodes to form a network of interconnections, hence the term connectionism.
Each of these nodes can be connected to many different networks. The knowledge
is stored in these interconnections and is associated with other kinds of
knowledge contained in the network and to other networks, hence the relationship
to associationism. Connectionists believe that these interconnections store
the lexical information, however this does not mean that the information
is stored in one place (one cannot look inside the brain and find a particular
word for example), but in the interconnections between the nodes in the
form of a network. One could visualize that the representation of a word
might involve interconnections between various parts of the network, for
example to the phonological, semantic or orthographic parts of the network.
From this we can see that the knowledge is distributed among many interconnections.
This distribution information provides us with several advantages which
will be discussed later.
Some connectionists believe that information is related to each other in
the brain in the form of massively interconnected sub-networks rather than
as a simple unified system. These sub-networks store information that can
be accessed by other sub-networks. For example, a sub-network of morphological
knowledge can connect with a sub network of word roots, which in turn can
connect to a semantic sub network which stores meanings of words. While
the exact make up of these interconnections is not known, we do have some
insights from our knowledge of the mental lexicon what it might look like.
Future research may be able to clarify this knowledge.
From the interaction of these inter-related networks we can form the meaning
of a word and find the correct word to choose. If we have to find a past
tense form for example, the morphological network can be tapped to retrieve
it. Each sub-network making up a 'word' as such, would be connected to hundreds
or thousands of other nodes making up a mini-network for that word. These
sub-networks will be connected to areas of the brain that control the phonological,
speech, auditory functions as well as the storing of lexical-specific information.
The sum of all these interconnections for that word make up the knowledge
about that word which the learner has. Therefore, a well known word will
have a very intricate network of interconnections and less well known words
will have fewer interconnections. A different word would have a different
set of nodes connected to hold that information - another mini-network.
It may, of course, share many of the same nodes as other words, or may not
depending on the make up of that word. In essence then our lexicon (or lexicons)
is made up of hundreds or thousands of these sub-networks all massively
interconnected to form the lexicon.
Within a network the nodes are organized into 'levels' such that any one
node excites or inhibits other nodes at its own or different levels. Patterns,
habits and rules are not stored in these interconnections, but what is stored
are the interconnection strengths that allow these patterns and rules to
be recreated. Knowledge is seen at the micro structure level rather than
macro-structure level of cognition. Therefore, the strength of the interconnections
reflects the relative knowledge one has about an item of vocabulary. Prototypical
representations of the lexical environment emerge as a natural outcome of
the learning process. See Bechtel and Abrahamsen (1991) Broeder and Plunkett
(1994) Ney and Pearson (1990) for more detail in these areas. Learning,
therefore, is a by-product of processing.
Developing the picture.
In order to bring the abstract to the concrete, the following diagram seeks
to illustrate a learner's knowledge of the verb to see .

Diagram 1: A hypothetical partial representation of a learner's concept
of 'see'.
The reader should immediately notice several things about this network and
the limitations of representing the network in diagrammatic form. The first
and most obvious, is that the representation is incomplete and is only a
partial representation of a learner's knowledge of see. That said, the concept
of see in diagram 1 is distributed among many interconnections, some of
which are thin and some are thick. The stronger the interconnection (thicker
line) the more 'well-known' the information is, the thinner the line represents
less 'well-known' information. The learner is relatively sure that see means
something like 'an image comes to my eyes' and that it collocates with some
objects. She is less sure about her knowledge (partial knowledge) that the
pronunciation of the past tense of saw is /sØ:/ represented by the
thinner line.
Secondly, diagram 1 shows, for diagrammatic purposes only, nodes that have
been labeled 'meaning' 'past tense ending', 'preposition use' and 'objects
that collocate' with their sub-categories. The learner could assign labels
to these quite differently and in fact not even have them categorized as
shown, but in some completely different way - reflecting her own view of
the word see. Alternatively, these nodes could not exist at all, or there
could be no interconnections between them, reflecting no knowledge between
these nodes and thus no knowledge of see. The diagram does not show all
the other possible nodes about see - for example there are no nodes for
its 'idiomatic use', the knowledge that see is pronounced the same as sea
and so on.
Thirdly, each of the nodes and sub-categories, such as 'meaning' are shown
as being connected to other parts of the network by the lines leaving the
diagram. Therefore, the network is immensely complex in structure. It would
take only a little imagination to conceive of a diagram which could represent
knowledge of 'affix knowledge', 'words I have problems with', 'semantic
networks, 'words to use when apologizing in German' and indeed many facets
of vocabulary acquisition all linked together. Such a highly interconnected
network would, of course, be beyond diagrammatic representation.
What can the model demonstrate?
Associative learning.
Clearly, the associative nature of vocabulary is shown here. Each network
of knowledge is connected to many other networks. This model can also demonstrate
how we could instantiate knowledge from the network in a schematic way.
One piece of lexical information connects to another and can instantiate
a related idea or word (see Rumelhart and Ortony, 1977 for a discussion
of schema). Schema theory has shown us the importance of background knowledge
and the relationship it has to comprehension (see Brewer and Treyens, 1981
for an example). Sometimes learners cannot comprehend a lexical item due
to insufficient conceptual development or lack of background knowledge.
This model can show the interconnections (or lack thereof) to non linguistic
knowledge that can hamper comprehension. By the same token, if a learner
comes across a new word he may be able to guess from context prior lexical
knowledge. Clearly the richer the network of associations, the more chance
there will be of comprehension. The learning of an L2 lexicon would involve
deepening and enriching these networks and their interconnections.
Partial knowledge
The model can account for full, partial and incorrect storage of lexical
knowledge. Knowledge that things are not something can also be accounted
for in this model. For example, this learner may explicitly know that the
past tense ending of see is not seed (is not /I:d/) This would be represented
by drawing a line to that part of the diagram - either thickly or thinly
depending on the strength of that knowledge.
Incremental learning
Interlanguage phenomena point to a learner system whereby learning is incremental,
and done in successive and / or recursive steps. This models reflects this
well as it can account for information that is not part of the L1 nor the
L2, but nevertheless is systematic and which the learner is constantly updating
(or has fossilized) (see Klein, 1986).
Content addressability.
Word knowledge in this network is content addressable . This means that
if a learner is asked for a word that means 'a round, hollow leather thing
that you can play soccer with', or asked for the meaning of 'soccer ball'
he can answer from both directions. Therefore, the information is stored
in a connectionist architecture can be accessed in many ways.
Individual variation.
Each learner will have a different network of associations and interconnections.
Clearly the L1 can intrude on the transfer of L1 lexical knowledge. This
model can account for learner variation even with learners from the same
L1 and with the same input having differing lexicons. Some SLVA researchers
have proposed different lexicons serving different purposes such as productive
or receptive L2 lexicons. In this model there is no reason to assume that
sub-networks for separate lexicons could not exist side by side or be interconnected.
At the other extreme, those who say there is only one lexicon for all lexical
knowledge, both from the l1 and the L2 could also be accommodated here.
Advantages of a distributed network.
The knowledge is stored in the interconnections, therefore each node can
connect to many networks. A major advantage of this is economy of the network
in the sense that a single node could be connected to many others thus allowing
one node to form many representations. This in turn allow the parallel processing
of information where the brain can process many things at once. Clearly
we receive many different forms of input at any given time all of which
must be process simultaneously .
Another advantage of a distributed system is that if one part of the system
deteriorates (for example a given word is known but temporarily cannot be
recalled) the whole system does not break down as the forgotten word will
be connected to other words which could replace it. This is often called
graceful degradation. For example, if a learner had learned collapse but
when called upon to produce it cannot access it, a substitute could be found
from within the network such as fall down. The network thus has built in
redundancy in that the capacity to continue correct operation despite the
loss of part of the information comes from the fact that the original network
had encoded more information than was necessary to maintain the network.
Human-like behaviour.
One of the main achievements of a connectionist system is that it can process
information and learn in ways which mirror some aspects of human learning
and information processing such as, pattern matching, spontaneous generalization,
stimulus categorization and concept learning. This make these models very
attractive to psychologists in particular. In addition, some of the models
developed have been able to model at least some specific aspects of human
performance (see Cohen et al, 1993, and Haberlandt, 1994 for numerous examples).
Generation from experience.
Connectionist systems have the ability to automatically or spontaneously
generalize from experience. This could be done both productively and receptively.
For example, if a learner knows the affix '-ist' can refer to a person doing
a particular kind of job or work, then when the learner meets an unknown
word ending in '-ist' he can guess that it would be a person doing a certain
type of work. Similarly, if a learner wanted to generate a word '-ist' could
be added to a person's job to create that word. For example, he might generate
'pianist' (if it was not known) from knowledge about piano, work, and -ist.
Alternatively, he could create a novel word such as 'computerist'. Furthermore,
overgeneralization of lexical applications can be explained in these terms.
Lack of lexical knowledge can be represented.
Beginning vocabulary learners often will not have an L2 network set up for
some words/concepts let alone one that can find substitutes when needed.
This lack of a developed network could help to account for why it is that
learners are at a loss for words at times - simply the network has not been
set up or it contains insufficient knowledge. It would therefore follow
that lexical items which are not repeated or met frequently could have tenuous
interconnections. Therefore there needs to be constant practice and reinforcement.
It should be noted that this practice is not behaviourist in the sense that
each items will need repeating many times and that is the only way to learn.
What it does mean is that the interconnections may need reinforcing to strengthen
the interconnections.
Learning under this model.
As we learn, we constantly match new input to old information and adjust
our knowledge store network according to the new information. Our processing
of the input affects our future potential output in that the present knowledge
store has been altered by new input and a new status quo is made until new
input comes along to confirm the present state or lead us to review it again.
Connectionism rests on the assumption that we learn by trial and error in
successive steps, incrementally and through exposure to input. Successive
steps in the learning process alter the associative interconnections by
the strengthening or weakening of the interconnections. The more well known
a piece of word knowledge is, the stronger the interconnection that makes
up that part of the word's knowledge. This matches the view that a new word
will be not learned completely on first meeting, but the knowledge of that
word (such as the pronunciation, spelling, 'grammatical' features of the
word, its collocates, register and so on) will incrementally grow with the
number of times the word is met in various contexts. It will be a rare occasion
that a new word is learned at one trial with all its features readily available
for use, though connectionist networks do not prevent this happening. This
does not mean that a word cannot be learned at one trial however, despite
the fact that present connectionist simulations cannot do so.
The network or sub-networks making up the lexicon is ever changing and one
could view it as never resting. Imagine for a moment we could take a snap
shot of the network at rest. If a network representing say the word/concept
'do' were caught at rest, we could see that some interconnections were strong
reflecting perceived well-known information (even if it is wrong) and others
were weak reflecting less well known information. As new information is
added, new interconnections are made to different nodes to account for this.
The strength of these interconnections is altered by the input strengthening
some interconnections, for example confirming that we in fact say 'do the
washing' and weakening interconnections of other parts of the mini-network
making up 'do'. For example, if a learner said 'do a crime' and was corrected,
then the learner could then connect 'crime' with 'commit' rather than with
'do' making a new interconnection. This would not mean that the collocation
would be 'learned' but that a link had been made and probably the learner
will continue to use 'do' in preference to 'commit' until the network has
been so altered through repeated exposure, practice and use to reflect the
preference for 'commit' over 'do'.
Evidence from second language data.
Very little work on connectionism has been done in second languages. Notable
exceptions are Schmidt's review (1990); Broeder and Plunkett's study of
developmental order for pronouns in L2s (1994); Blackwell and Broeder's
work on frequency (1992) Gasser's work on word order, (1988, 1990) Shirai's
work on L1 transfer (1992) and Marchman (1992) and Sokolik and Smith's work
on critical periods (1992). This lack of work does not mean a lack of interest
however and is understandable in tat the field is only 10 years old. Most
of the research has been done in the first language and it has only been
very recently that work has started on a second language. An extensive search
found no specific studies of second language vocabulary acquisition from
a connectionist perspective. This may be due to the very complex and multi-faceted
nature of SLVA and the fact that researchers may be more interested in the
bigger picture of SLA rather than SLVA in particular.
What can the model not demonstrate?
The working mind, intention and higher cognitive functions.
One thing missing in many discussions of connectionism is the conscious
working mind - the things that we call a consciousness, memory, intention
and so on. These are often referred to as higher cognitive functions. These
quite obviously exist in some form or another as we can all say we have
them. Purist connectionism views these as the by-products of the processing
of information, whereas more traditional views of cognition (the current
dominant experimental paradigm) see the mind as being somehow broken into
parts. This modular view says we have different forms of memory and storage
and that these can be tested in certain ways to find out how our lexicons
work. It is clear that there are levels of human processing for which PDP
models may not be an appropriate level of analysis, at least given the current
generation of PDP models. If the higher cognitive functions exist in PDP
terms, we would need to be able to explain why there are parts of a PDP
system which are transparent and why other parts are not.
Universal application.
It is not generally accepted even by PDP researchers that a connectionist
(=PDP) model can account for all areas of human cognition, although many
try to resist external explanations. The challenge for these researchers
then is to develop a system to 'account for the phenomena which are handled
rather well by rules but also, without additional mechanisms, give an elegant
account of other phenomena as well' (Betchel and Abrahamsen, 1991 p. 217).
Connectionist models are good at the lower-level of cognition such as content
addressability, low level perception and spontaneous generalization. However
there has been little success in discovering such examples at higher levels
of cognition. It may be that we should not be trying to explain all things
at all levels, but we could fall back on the idea of levels (to be outlined
below).
Capturing syntactic structure.
Fodor and Pylyshyn (1988) argue that a connectionist system cannot capture
the representation of syntax well. Their example says that a PDP system
can connect Joan, loves and florist in 'Joan loves the florist' to give
it meaning, but it cannot discriminate it from the relationship in semantic
terms with 'The florist loves Joan'. A network could add the representation
but it could not disambiguate the two sentences. Therefore, they say that
a PDP is inadequate to the task of representing syntactic knowledge. This
is despite Chomsky (1986) stating recently that the generatabiliity of syntax
is no longer the goal of generative linguistics.
Developmental sequences
Fodor and Pylyshyn state that the model is not good at learning in developmental
stages which a rule-based approach can capture. This ignores the fact that
some aspects of language tend to be rule-governed and some aspects do not.
All languages have exceptions such as go / went and suru (do) and kuru (come
in Japanese. PDP systems do in fact go through seems developmental stages
as do first language learners. This is not clear for second language learners
however. In the first phase the systems tend catch the irregularities by
rote, and the second phase concentrating on rule-governed regularities.
In the final stage the model strikes a balance between the two poles of
regularity and irregularity and even overgeneralizes at times as children
would do (e.g. feets instead of feet).
Non-human behaviour
Due to the very nature of these systems not being transparent, they cannot
be tested empirically at the micro level of cognition and we are left with
computer simulations of learning. These computer models cannot sufficiently
model human behaviour exactly and indeed sometimes generate very non-human
responses. In addition the computer simulations take a long time to learn
whereas humans can learn at one trial and new simulations need to be developed
to account for these inadequacies.
Differences from symbolic processing.
It is important to distinguish connectionism from a symbolic account of
learning and knowledge storage. In symbolic systems word knowledge is couched
in terms of parts of speech such as nouns, verbs, or semantic groups such
as 'words for travel' and so on each having a label for the kind of knowledge
stored - a symbol for that knowledge. Typically, symbolic systems have rules
by which this information can be processed and rules which state what is
impossible in a language.
Symbolic systems are context insensitive in that they are distinct from
their environment. Elman (1991, p. 221) says 'this insensitivity allows
for the expression of generalizations which are fully regular at the highest
level of representation (e.g. purely syntactic), but they require additional
apparatus to account for regularities which reflect the interaction of meaning
with form and which are more contextually defined. Connectionist models
on the other hand begin the task at the other end of the continuum. They
emphasize the importance of context in the interaction of form with meaning'.
Symbolic systems, therefore are subject the fallacy that things can only
be referred to in symbolic terms and therefore do not connect themselves
to the real world. That is, an alien listening to us via radio signal might
learn the sounds of the language but not the semantics unless they could
observe a word's relationship with objects and the events to which it refers.
A network system, by comparison, can deal with anomalies by adding further
assumptions. See Johnson-Laird et. al. (1984) for a review in this area.
Symbolic systems such as the generative linguistic paradigm would account
for linguistic knowledge in terms of nouns, subjects, objects and so on.
These terms do not exist in purist (PDP) systems . PDP systems will accept
that rules can be stored in a connectionist network, but they are not the
foundation stone on which the network is made. This means that under a PDP
paradigm, the symbolic system loses its causal role in cognition and is
thus an unacceptable outcome to many linguists as a typical UG proponent
would see these rules as essential to human linguistic processing. However,
it may be that aspects of human performance which appear so regular as to
be conveniently summarized by rules (like the rules of grammar in a language),
may arise out of the general properties of parallel distribution which operate
without any reference to such rules. Recent debates by Fodor and Pylyshyn
(1988) and the Jacobs and Schumann (1992) v's Eubank and Gregg debate (1995)
and a reviews by Bechtel and Abrahamsen (1991) and Morris (1989) underscore
these differences. These debates take place on the basis of accepting one
view means the other is unacceptable. Both sides tend to see things in extreme
terms - a universal take-it-all-or-leave-it-all view (Pinker and Prince
1988). Neither side has produced evidence for this universality and clearly
both have their limitations (see Cohen et al. (1993) for a review).
However, if one views the connectionist / symbolic argument in terms of
an non-universal answer then the situation changes somewhat and one can
see things in terms of complementary rather then confrontationary stance.
Clearly much has been learned about the workings of memory in relation to
vocabulary learning in a second language in cognitive terms (see Nation,
1990 for a review) but they offer little in the way of insights into the
micro-view of cognition which connectionism seems to explain quite well.
It seems therefore that the issue of whether the current symbolic paradigm
or connectionism is the one and only explanation misses the point.
Summary.
Connectionist systems of vocabulary acquisition have many characteristics
that are desirable in simulations of human cognition, for example graceful
degradation automatic generalization and so on. Many of these are found
in other models of cognition, but it is unusual to find so many in one model.
These models show the learning process over time, this is important as most
studies in SLVA have been cross sectional in nature.
There are parts of our cognitive apparatus which are open to inspection
and are transparent in nature and empirically testable, such as memory span,
lexical competence, attention and so on. There are other parts of our cognitive
system which are not open to inspection, such as how we retrieve lexical
information from our brain or how we process the auditory information and
add it to our store. A connectionist account of lexical knowledge is good
at describing the storehouse of vocabulary. That is, how the words are connected
through their associations; how we may store and retrieve lexical knowledge;
how lexical knowledge is schematic or associative, and how it can substitute
for lack of knowledge, how we can guess the meaning of words and so on.
It seems that the connectionist architecture could operate at a lower 'impenetrable'
level of cognitive activity whereby we are not able to access it by introspection,
in a sense it is unavailable to us and the interconnections are made automatically
without our intervention. The transparent part of our cognitive system may
operate at a higher level and would include what we know about memory and
so on. This would lead to a two level interdependent model of vocabulary
acquisition. It would make sense to have a two level hybrid system because
the symbolic machine operates according to its own autonomous set of principles.
This view is the one currently coming into fashion (see Kempen, 1992; Marcus
et al. 1992, 1993; and Pinker, 1991).
Part 2: An account of lexical storage.
In the model of lexical storage to be outline below, we can see a micro
structure of vocabulary knowledge stored in the connectionist style network
but linked to and controlled in a sense by the working memory and transparent
cognitive systems. This working memory retrieves its information from the
network storage area and uses that as a basis for making linguistic decisions.
There is therefore a two way highway going between these two.
The principle components of the model.
The principle components of the model outlined below are a sensory register,
working memory (after Baddeley, 1990) and a storage area called the network
/ store. One is always faced with problems when attempting to represent
non 2 dimensional ideas diagramatically. However, for the convenience of
the reader I have provided a very rough outline in diagram 2 below.

Diagram 2: A representation of the interface between input
data, working memory and memory networks.
In this model input, is received at the sensory level (the level at which
information is registered on the retina or ear drums) the central executive
will attend to some of this input and put it into working memory. The information
considered most salient and unexpected is usually that which is attended
to . This input then becomes conscious in that you are consciously (at varying
levels) aware of it - the rest of the input is not attended to and is effectively
discarded. This input is then compared with pre-existing lexical information
in the network which, in keeping with schema theory, is in a constant state
of expectation that the incoming data will match pre-existing vocabulary
knowledge and thus confirm that knowledge and lead to comprehension.
As the reader can imagine, this is a simplistic account of the process.
It will be expanded below. One can see from diagram 2 that the central executive
functions as a mediator between the network and the input. It is primarily
concerned with (among other things) what to attend to, what to ignore, how
to put this into the network and so on. There is freedom of movement between
the network and working memory and the higher cognitive functions allowing
the flow of information back from the network into working memory for re-evaluation
and reflection.
Features of Working Memory.
Working memory has three major components and is modeled on Baddeley (1990).
The central executive regulates, monitors and coordinates the operation
of the other components. The phonological loop, which is divided into the
articulatory control system, which can work at the sub-vocal level, and
a phonological store which holds speech based information. The final part
is the visuo-spatial sketch pad which receives inputs either directly from
visual perception or by retrieving information from long-term memory in
the form of images. It is at this level that we are said to have intention
and a 'consciousness'.
The central executive regulates what will be attended to (or not). Baddeley
in recent work has said that the central executive closely resembles attentional
control and thus possesses limited capacity and is of little use in the
active processing of lexical information. It seems that attention demanding
tasks such as lexical problem solving, reading, word learning, writing all
utilize this central executive. In addition the central executive monitors
the performance of sequences of actions to be performed in the right order.
Baddeley sees the central executive as essentially similar to the Norman
and Schallice (1980) model called the Supervisory Attentional System (SAS)
which controls ongoing behaviour, maintains goals and resists distractions.
The advantage of this system is that a working memory model treats the short-term
storage of memory and general processing in a single framework.
Adjusting the Network (Learning).
When new lexical input is received, the input data are compared with pre-existing
lexical knowledge to see if it matches. The input can be processed at the
level of content (facts about the story being read - the characters and
the plot for example) or at the linguistic level (the finding of new words,
expressions, the making interconnections between previously unconnected
words etc.). The central executive could also find that the network had
previously tagged that item for further investigation leading to the potential
for a change in, say, reading behaviour (for example stopping and re-reading
in order to find out more about the tagged word).
The default setting at the time of input is that the understanding of this
input will be easily predictable by the pre-existing lexical network and
thus would require a minimal level of focal attentive processing. An example
would be that if a learner was studying superlatives he would instantiate
that network and expect to read (probably unconsciously) such words as the
biggest, most, and so on. If this is indeed the input he receives, then
the learner matches this input (the words he's reading) with the current
network related to that word / phrase or type of text and finds that it
fits the network as he had expected. If it does confirm the pre-existing
knowledge by fitting in the network smoothly, then the meaning of the message
may be retained (the content of the message may later be accessible for
report, review, reflection and so on). The lexical network is thus being
adjusted by the strengthening of present interconnections confirming already
known information and the addition of other networks.
In some circumstances the input is perceived or noticed to be different
from the previously understood or learnt information. In the above example,
this might be noticing that he had believed the superlative of good is bestest
, when in fact it is best. A gap between pre-existing knowledge and new
input is noticed. At this point the learner can readjust the network to
accommodate this new information. Sometimes this will be easy if the network
is highly developed, but considerable adjustment (along with possible confusion
and 'thinking') may be necessary if the information is vastly different
from that stored. This adjustment to the network is called 'learning'.
It should be clear that learning takes many forms. Any adjustment to the
network is a form of learning. For example learning a new collocation, the
spelling of a word a different use, learning that the learner has incomplete
information and so on. Examples of some adjustments to the network could
include:-
The activation of a new interconnection between existing nodes would
reflect the linking of previously unconnected information. Initially this
new link would probably be weak - showing unsure information, but could
be a strong link depending on the interconnection relationships and the
depth of processing.
The activation of a completely new node for completely new information
being added to the network.
Forming, checking, rejecting and reformulating lexical hypotheses.
Accounting for greater control, depth of analysis.
One of the functions of the central executive is to reconcile information
between the lexical network and the input. The network contains many kinds
of linguistic knowledge - from the L1, and from the L2 in terms or grammar,
word associations, chunks, phonological and orthographic knowledge and so
on. If new input does not match the present network, then some adjustment
has to be made to account for the information. The central executive has
an ability to look at the network and infer from linguistic or lexical patterns,
(guessing from context is an example) and to generalize (and thus overgeneralize).
To do this there has to be communication between the network and working
memory.
Limitations of the model.
The central executive.
We can find out a lot about the phonological and visuo-spatial components
but only very little about the executive itself as it is not transparent.
That is, we know a lot about the multitude of tasks it deals with such as
demands of lexical tasks, allocation of attention and so on, but very little
about how this is achieved. Moreover, this variety of functions poses problems
in terms of describing the precise function of the executive. It may be
best to compartmentalize the central executive into several specific processing
systems. We cannot of course do away with it as there needs to be some way
for us to control the chaos that would result from several operating mechanisms
all working independently.
In addition the central executive cannot easily distinguish between attentional
processing, which demands attentional control, and automatic processing
which does not. Furthermore, it is not clear how some tasks can be done
without requiring any working memory capacity (such as breathing when speaking).
It may e therefore that the working memory system may be more flexible than
was first imagined.
The borderline between what is and what is not governed by working memory
is not clear, nor indeed is the way that the two levels work together. It
may be that there is a part of the central executive which acts as a huge
switchboard regulating what happens where. At the moment this is supposition.
This may not be a problem as there will come a point in our experiments
where the workings of the mind become impenetrable and we are left with
looking at phenomena not being able to look deeper.
Predictability.
One problem is the inability of current versions to predict what structures
or aspect of word knowledge will be learnt next. Clearly this is important
for SLVA and SLA in general. Again it might not be a theoretical problem
not to be able to predict as we are not able to see the network. We may
only need to look at the result of the network and how it acts to determine
its function.
Problems with computer simulations.
Connectionist models are limited under present technology to investigation
by computer simulation. Limits to what can investigated are put on the program
itself. It is assumed that as these computer simulations get more sophisticated,
so will the results of the PDP studies. There is a need to accommodate some
of the criticisms of connectionism, such as the present inability to learn
in one trial. this does not appear to be a problem with the PDP theory,
but more with the implementation in software.
Connectionist simulations have problems in representing time relationships,
which is critical in the language domain, although see Elman (1990) and
Jordan (1986). Sampson (1987) suggests that one reason for this is that
it is the connectionist model itself that extends through time, via the
gradual setting of the network, whereas in a model based on rules one can
think of the rules as applying instantaneously. Thus it is difficult to
treat time as an input-output feature and to input data sequentially would
cut across the very parallel processing nature of the models. Alternatively
it may also be that as, by definition, connectionist computer simulations
are devoid of a 'here-and-now' component and the context for it, then the
concept of time is external to the micro-level of cognition, and can be
left to the higher levels. Again this seems to be a problem with software
rather than with theory.
Fluency and accuracy
There has been little discussion about fluent or skilled vocabulary use.
The learning of skills is an important area in SLVA. It has been noted that
connectionist systems excel at pattern recognition. It could be that skills
are highly organized patterns of behaviour and thus can fit under this paradigm.
The problem is how to represent fluency in the model. Is it just dependent
on the strength of the interconnection or is it something more complex?
There is some light however as mentioned in the section above about the
developmental stages of learning in a connectionist network.
Forms of lexical knowledge.
The lexicon of a second language learner is very complex. There is great
debate as to whether a learner has a single multi-store lexicon or indeed
several lexicons for different languages. This system needs to explain why
that might be and how the information is stored. Is it just simply a matter
of adding a new network or again is it something more complex?
Newness.
Connectionism is a radically different view from the more traditional paradigms
and thus as it is the 'new boy on the block' it is subject to all the stages
of finding its way in a new world. It is also highly mathematical and can
only be tested by computer simulation and thus bares little resemblance
to what actually happens in a class. Finally, some argue that it is a return
to behaviourism in that stimuli and responses affect the nature of the networks,
but as it offers a much more comprehensive explanation that purely a behaviourist
view, these arguments falter somewhat.
Innateness of the language faculty.
Some UG proponents would argue that a connectionist system needs to explain
the logical problem of language acquisition and that there must be some
innate quality to language acquisition. It would seems at first glance that
the two positions are opposed. The symbolic camp needing innate rules and
the connectionist camp not needing them. There is no reason to assume that
we could not be born with a prewired lexical / syntactic and general cognitive
network all set up for language acquisition with all the 'parameters' being
set at an early age. If this is so it would strengthen both sides' arguments
rather than weaken one against the other.
Summary.
The model presented here can account for numerous aspects of SLVA. However
there are still may unexplained areas specifically at the explicitness of
the network and its relationship with the higher level cognitive functions.
It does hold some promise although will need considerable revision preferably
as a result of extensive research into the connectionist nature of second
language vocabulary acquisition.
Footnotes.
References.
Baddeley, A. Human Memory: Theory and Practice. Boston: Allyn & Bacon. 1990.
Bechtel, W. and A. Abrahamsen. 1991. Connectionism and the mind: An introduction
to parallel processing in networks. Blackwell, Oxford.
Blackwell, A. and P. Broeder. 1992. Interference and facilitation SLA: A
connectionist perspective. Paper presented to Seminar on Parallel Distributed
Processing and Natural Language Processing, San Diego. UCSD.)
Brewer, W. F. and J. C. Treyens. 1981. Role of schemata in memory for places.
Cognitive Psychology. 13: 207-30.
Broeder, P. and K. Plunkett. 1994. Connectionism and Second Language Acquisition.
In: N. Ellis (Ed.) Implicit and explicit Learning of Languages. Academic
Press, London.
Chomsky, N. 1986. Reflections on Language. New York, Pantheon.
Cohen, G., G. Kiss and M. LeVoi. 1993. Memory. Current issues. London. The
Open University.
Crick, F. and C. Asanuma. 1986. Certain aspects of the anatomy and physiology
of the cerebral cortex. In J. L. McClelland et al.
Elman, 1991. Incremental learning or the importance of starting small. La
Jolla: University of California, San Diego, Centre for Research in Language
(Technical Report 9101).
Eubank, L and K. Gregg. 1995. Et in Amygdala ego. UG, (S)LA and Neurobiology.
Studies in Second Language Acquisition. 17: 35-57.
Fodor, J. A. and Z. W. Pylyshyn. 1988. Connectionism and cognitive architecture:
a critical analysis. Cognition. 28: 401-12.
Gasser, M. 1988. A connectionist model of sentence generation in a first
and second language. Unpublished doctoral dissertation, University of California,
Los Angeles.
Gasser, M. 1990. Connectionist models. Studies In Second Language Acquisition.
12: 179 - 99.
Haberlandt, K. 1994. Cognitive Psychology Needham Heights, Mass : Allyn
& Bacon.
Jacobs, B. and J. H. Schumann. 1992. Language acquisition and the neurosciences:
Towards a more integrative perspective. Applied Linguistics. 13: 282-301.
James, W. 1890. The Principles of Psychology. New York: Holt.
Jordan, M. 1986. Serial order: A Parallel Distributed Processing Approach.
La Jolla: University of California, San Diego, Institute for Cognitive Science.
(Report 8604).
Johnson-Laird, P. N., D. J. Herrmann, and R. Chaffin. 1984. Only connections:
A critique of semantic networks. Psychological Bulletin. 96: 292-315.
Kempen, G. 1992. Second language acquisition as a hybrid learning process.
In F. Engel, Bouwhuis, D. Bösser, T. and d'Ydewalle, S. (Eds.) Cognitive
modelling and Interactive environments in Language learning. (pp., 139-44).
Berlin Springer.
Klein, W. 1986. Second Language Acquisition. Cambridge: Cambridge University
Press.
Marchman, V. 1992. Language learning in children and neural networks: Plasticity,
capacity and the critical period. San Diego, CA: Center for Research in
Language (UCSD) (Technical report 9201).
Marcus, G. Brinkmann, U., Clahsen, H. Wiese, R., Woest, R. and Pinker, S.
1993. German Inflection: The exception that proves the rule. Cambridge,
Mass.: MIT occasional paper 47.
Marcus, G., Pinker, S. Ullman, M., Hollander, M., Rosen, T and Xu, F. 1992.
Overregularization in language acquisition. Monographs of the Society for
Research in Child Development. 57 : 4 serial 228.
McClelland, J. L. 1981. Retrieving general and specific knowledge of specifics.
Proceedings of the Third Annual Conference of the Cognitive Science society.
170-2.
McClelland, J. L. and D. Rumelhart and the PDP research group. 1986. Parallel
Distributed Processing: Explorations in the micro - structure of cognition.
Vols. I and II. Cambridge, MA: MIT Press.
McClelland, J., D. Rumelhart & G. Hinton. 1986. The appeal of parallel distributed
processing. In McClelland, J. L. et al. (Eds.)
Morris, R. (Ed.) 1989. Parallel distributed processing. Oxford: Oxford University
Press.
Nation, I. S. P. 1990. Teaching and Learning Vocabulary. New York. Heinle
Heinle.
Ney, J. and B. Pearson. 1990. Connectionism as a model of language learning.
The Modern Language Journal. 74: 474-82.
Norman, D. A. and T. Schallice. 1980. Attention to Action: Willed and automatic
control of behaviour. University of California, San Diego, CHIP Report 99.
Pinker, S. 1991. Rules of Language. Science 253: 530-35.
Pinker, S. and A. Prince. 1988. On language and connectionism: analysis
of a parallel distributed processing model of language acquisition. Cognition.
28: 73-193.
Rumelhart, D. E and A. Ortony. 1977. The representation of knowledge in
memory. In R. C. Atkinson, R. J. Herrnstein, G. Lindzey and R. D. Luce (Eds.).
Steven's handbook of experimental psychology: Learning and cognition. Pp.
99-135. Hillsdale, NJ: Erlbaum.
Sampson, G. 1987. Review of Rumelhart, D., McClelland, J. L. and the PDP
research group, Vols. I and II. Parallel Distributed Processing: Explorations
in the micro - structure of cognition. Cambridge, MA: MIT Press. 1986. Language
63: 871 - 86.
Schmidt, R. W. 1988. The potential of parallel distributed processing for
second language acquisition theory and research. University of Hawaii working
papers in ESL, 7 (1); 55 - 66.
Shirai, Y. 1992. Conditions on transfer: a connectionist approach. Issues
in Applied Linguistics. 3: 91-120.
Smolensky, P. 1986. Neural and conceptual interpretation of PDP models.
In J. L. McClelland et al.
Sokolik, M. and M. Smith. 1992. Assignment of gender to French nouns in
primary and secondary language: A connectionist model. Second Language Research.
8: 39-58.
Thorndike, E. 1898. Animal intelligence. New York, Macmillan.
Contact Info:
Rob Waring
Notre Dame Seishin University, 2-16-9 Ifuku-cho, Okayama, Japan 700
Tel 086 252 1155 Fax 255 7663 Home 086 223 0341
Email:Rob Waring
Return to Main menu of papers