The development process for a standards-based
language assessment instrument such as a language test begins with a conceptual
basis. There is much to consider even before reaching the design stage of
developing an assessment. There are also some common misconceptions about the
creation and use of language tests as well as unrealistic expectations that
prevent people who need to create and use language tests in their professional
work. Mystique and a belief that language testers have some “almost
magical procedures and formulae” for creating “the best” (Bachman 3) test is a
common misconception. It is thought that there is a perfect language test
and many people wish to know how to develop such tests for their own testing
needs but there is no such thing as the one best test, even for a specific
situation.
Bachman and Palmer believe that there is a “need
for a correspondence between language test performance and language use in
order for a particular language test to be useful for its intended purpose,
test performances must correspond in demonstrable ways to language use in
non-test situations.” This essay’s aim is to report my findings in the
text Language Testing in Practice and in doing so, remove any mystique
associated with the development process for language assessment and describe a
model for test usefulness that includes six qualities -- reliability, construct
validity, authenticity, interactiveness, impact, and practicality according to
Bachman and Palmer.
Reliability
Reliability is frequently defined as
“consistency of measurement.” A test score that is reliable should be the same
across the board. For example, if two tests were to be taken by the same
group of people on two different days, in two different environments, it should
not matter to one individual where he/she takes the test on one day or the
other. It should also be of no consequence to the individual whether
he/she takes two forms of a test that are equivalent (used interchangeably);
he/she should receive the same score on either test.
Construct
Validity
Bachman and Palmer define construct as, “the
specific definition of an ability that provides the basis for a given test or
test task.” Then the term construct validity refers to the degree in which a
given test score can be used to measure the ability or abilities of a
test-taker. For example, if there were a need for a placement test in an
academic writing course, then using a multiple-choice test of grammar might
give reliable scores. Yet, it may not be the good enough to use as a
placement test for a writing course because grammar is only one part of
academic writing ability. To define the “construct” here to only test
grammatical knowledge is very narrow considering that the intended language use
or “domain” involves “metacognitive strategies, topical knowledge and affective
responses as well” (Bachman 23)
Authenticity
Essentially, the language assessment tasks
should correspond with the “target language use.” For instance, you need
to know vocabulary of the items you would be selling for a job and you were
given a test that had a written passage with descriptions of the merchandise
that you would sell. In this passage, key vocabulary terms have been
deleted and you need to fill in the blanks. The topical content of the
test is relevant, but the task of filling in missing words might be irrelevant
or inauthentic.
Interactiveness
Interactiveness is defined as the use of the
“test taker’s individual characteristics in accomplishing a test task. The
individual characteristics that are most relevant for language testing are the
test taker’s language ability, topical knowledge, and affective schemata”
(Bachman 25). Topical knowledge can also be referred to as “real-word
knowledge” and affective schemata “provide the basis on which language users
assess the characteristics of the language use task and its setting in terms of
past emotional experiences in similar contexts” (Bachman 65). The
interactiveness of a specific test task can be explained by the ways in which
the test taker’s language knowledge, “topical knowledge” and “affective
schemata” are engaged by the test task.
Impact
The impact of assessment exists on two stages:
“a micro level, in terms of the individuals, and a macro level, in terms of the
educational system or society”. Bachman (1990) also points out, “tests are not
developed and used in a value-free psychometric test-tube; they are virtually
always intended to serve the needs of an educational system or of society at
large.” Washback is a byproduct of impact and assessment can have a positive
effect or negative effect on students and teachers.
Practicality
To judge if a test is practical one must
consider the resources that will be needed to develop an assessment that is
useful and also whether this fits in with the resources available. Practicality
is meeting the demands of a particular test within the limits of existing
resources such as human resources, material resources, and time.
Bachman, L., &
Palmer, A. S. (1996). Language Testing in Practice. Hong Kong:
Oxford University Press
No comments:
Post a Comment