The Impact of TEM-8 (Test for English Majors Band 8) on English Majors in China Cai Wen Kristianstad University School of Teacher Education English, Spring 2010 Level IV English Tutor: Lena Ahlin Table of Contents 1. Introduction _____________________________________________________________ 1 1.1 Aim _______________________________________________________________________ 2 1.2 Material ___________________________________________________________________ 2 1.3 Method ____________________________________________________________________ 3 1.4 A Brief Introduction of TEM-8 _________________________________________________ 4 2. Theoretical Background ____________________________________________________ 5 2.1 The Purpose of Testing _______________________________________________________ 5 2.2 Test Usefulness ______________________________________________________________ 6 2.2.1 Reliability ______________________________________________________________________ 6 2.2.2 Validity ________________________________________________________________________ 8 2.2.3 Authenticity ____________________________________________________________________ 12 2.2.4 Interactiveness __________________________________________________________________ 14 2.2.5 Impact ________________________________________________________________________ 16 2.2.6 Practicality _____________________________________________________________________ 19 3. Analysis and Discussion ___________________________________________________ 20 3.1 The System of the Test and Its Impact __________________________________________ 20 3.1.1 The Organization of the Test _______________________________________________________ 21 3.1.2 The Implementation of the Test _____________________________________________________ 21 3.1.3 The Reform of the Test____________________________________________________________ 30 3.2 The Test Usefulness _________________________________________________________ 32 3.2.1 Reliability _____________________________________________________________________ 33 3.2.2 Validity _______________________________________________________________________ 36 3.2.3 Authenticity ____________________________________________________________________ 38 3.2.4 Interactiveness __________________________________________________________________ 40 3.2.5 Practicality _____________________________________________________________________ 41 3.3 Impact ___________________________________________________________________ 42 3.3.1 Impact on Test Takers ____________________________________________________________ 42 3.3.2 Impact on Teachers ______________________________________________________________ 45 3.3.3 Impact on Society and Education Systems _____________________________________________ 47 4. Conclusion _____________________________________________________________ 48 References _________________________________________________________________ i Appendices ________________________________________________________________iii Appendix 1: Questionnaire ____________________________________________________________ iii Appendix 2: Interview _______________________________________________________________ vi Appendix 3: Specifications for the TEM-8 (Excerpts) _______________________________________ vii 1. Introduction A test is a number of questions or exercises to find out how good someone is at something or how much they know. According to different education aims, test types, test standards and test scorings, testing can be divided into different kinds. In the education area, testing can be used to evaluate education, diagnose learning, and help learning (Chang et al. 2006: 18). It is very important in the process of education. TEM, Test for English Majors, is a particular EFL (English as a foreign language) test in China. It was set up by the State Education Commission in 1991, and has been organized by the Higher Education Institution Foreign Language Major Teaching Supervisory Committee since then. The test has been running for about 20 years. It was set to test the actual performance of Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000). There are two levels in TEM, TEM 4 and TEM-8. TEM-8, Test for English Majors Band 8, is based on a higher level of standard. The object of this test is all English majors when they are in their fourth year as well as their last year in college, or more specifically, in their eighth term, which is why the test is called TEM-8. It mainly tests students’ ability to use English as a foreign language in addition to testing students’ knowledge of words and grammar. In terms of measuring students’ integrative English language ability, TEM-8 is the hardest test for English majors in China (Li et al. 2007: 78). Every year, hundreds of thousands of English majors all over the country attend the test. It is an event that each English major student experience before graduation. However, testing must have some kind of effect upon the process of education, especially TEM-8, which is a very important test for English majors. Here comes the term of “impact”. Wall (1997: 291) defines impact as “any of the effects that tests may have on individuals, policies or practices, within the classroom, the school, the educational system, or society as a whole”. From individuals to the society, the impact has a wide educational context. Previous researches have worked on the validity of TEM-8, the authenticity of TEM-8, but seldom the impact of TEM-8. 1 This study is to investigate the overall impact of TEM-8 on students. This is what the education department and the English majors themselves want to know more about. 1.1 Aim This study aims to find out TEM-8’s impact on English major students, in terms of their daily life, learning process, and future life. Before that, this study first analyzes the system of TEM-8 and its impact on students, including the organization of the test, the implementation of the test, and the reform of the test. In addition, the test itself is also analyzed to measure whether the test is suitable for test takers, mainly focusing on the usefulness of the test. There are six qualities within the usefulness, namely reliability, validity, authenticity, interactiveness, impact, and practicality. However, as the main point of this essay, the impact is especially emphasized. In addition to the students, the impact on teachers, education system and the society is mentioned as well. 1.2 Material The test material used in this study consists of two parts, the test syllabus as well as a sample test paper. The Syllabus of Test for English Majors Band 8 (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2005) helps to analyze the system of the test as well as its impact on students. In addition, the sample test paper of 2009 is used in this research as well. The TEM-8 test is extremely well protected by the committee, so it is impossible to know the content of the test of the year before or even shortly after the test. After finishing the scoring of the test papers, scorers put the pictures of the test on the internet. This year’s test was taken less than two months before this study takes place, so the test has not been put on the internet yet. However, since the tests are similar from year to year, thus the test of 2009 is used in this research to help analyze the usefulness of the test. Two groups of people are involved as participants in this study. The first group is 50 English major senior students of a university in China. They took the questionnaire about what they think about the test, how they prepared for the test, and how the test influences their life. Since the test is taken in March, these students have just experienced this year’s test. Therefore, their feelings and opinions are very important for analyzing the impact of the test on students. The other group 2 consists of two English major teachers of the same university in China. The two teachers have been teaching courses directly related to the TEM-8 for years. The interviews of them can directly reflect the impact of the test on the teaching structure, as well as teachers’ attitude towards the courses and the test. 1.3 Method Analyzing test material is the first main research method used in this study. The Syllabus of Test for English Majors Band 8 (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2005) includes the aim, the nature, the organizer, the participants, the time, the framework, and the requirement of the test, all of which help to analyze the system of the test as well as its impact on students. In addition, the 2009 TEM-8 test paper is used in this study as well, mainly to analyze the usefulness of the test, especially the reliability of the test. In addition to the test material, questionnaire is used in the research as well (see Appendix 1). The questionnaire is used to find out what students think about the test, how they prepared for the test, and how the test influences their life. Questions 1 to 8 are about students’ opinion about the reliability, validity, authenticity, interactiveness of the test. Questions 9 to 11 investigate how students prepared for the test. The rest questions, namely Question 12 to Question 16 are concerned with how the test influences students’ life. The questionnaire was sent to the participants in China via email. After collecting the feedback from the students, the data and information are used to investigate the impact of the test on students. The third research method used in this study is the interviews with teachers (see Appendix 2). The interviews were also carried out via email. The two teachers who are in charge of the course answered the questions about the settings of the course, including the reason why set up the course, the content of the course, and the timetable of the course. Furthermore, the teachers gave their opinions about the teaching structure and students’ preparation for TEM-8. The information collected from the interviews is used to analyze test’s impact on teachers as well as the educational system. 3 1.4 A Brief Introduction of TEM-8 According to State Education Commission’s Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000: 4-5 my translation), the teaching task and aim of the basic level (Grade One and Grade Two) of English major in Higher Education Institution is to teach English basic knowledge, train students’ basic skills comprehensively and strictly, develop students’ ability of using English in reality, help students form good learning styles and appropriate learning methods, develop students’ abilities of logical thinking and independent work, enrich students’ knowledge of society and culture, enhance students’ sensitivity to the differences among different cultures, and make the students set the stage for senior grades’ study. (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000: 4-5 my translation) In other words, the syllabus means to make the students of basic level to be competent in every respect of English learning. On the other hand, for senior grades students (Grade Three and Grade Four), the syllabus brings forward higher standard. For those students, they should go on learning basic ability of language. Meanwhile, the students should make further efforts on enlarging their scope of knowledge. The emphasis of this stage is on developing students’ integrative competence of English, enriching cultural knowledge, and enhancing the ability of social intercourse. TEM-8 is just the test to assess and evaluate the actual performance of Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000) for senior students. In the meantime, TEM-8 also can assess the teaching quality as well as the students’ language ability, particularly the integrative language ability and communicative ability mentioned in the syllabus that Semester 8 students should have achieved. In this way, the test is able to promote the implementation of the syllabus so as to improve teaching quality. The nature of TEM-8 is a test that assesses test taker’s single and integrative language ability. The language abilities tested in TEM-8 include listening ability, reading ability, writing ability, and translating ability. Since the condition for testing oral ability in large scale is still immature, the supervisory committee has to put off this aspect of testing at this stage. 4 The test is organized in March every year, usually on the Saturday of the first week. For example, this year, 2010’s TEM-8 is carried out on March 7th. During that time, Semester 8 just begins and students return from Spring Festival. The supervisory committee means to assess the English major students’ language ability by the end of their college or university life, therefore, they choose this time to run the examination. The test contains six parts, i.e. Listening Comprehension, Reading Comprehension, General Knowledge, Proofreading and Error Correction, Translation, and Writing. The total time for the test is 195 minutes. There have been two reforms since the test was set up in 1991, respectively in 1997 and 2004. The new tests both came into effect in the following year, which is in 1998 and 2005. In this essay, the reform of 2004 is discussed here since this reform is the latest one and we are all dealing with this version now. In order not to be confused, the tests between 1998 and 2004 are called the old tests, while the tests from 2005 are called the new tests. 2. Theoretical Background When analyzing a test, several aspects should be included, such as the purpose of testing and the usefulness of the test. The test usefulness is composed of six test qualities—reliability, validity, authenticity, Interactiveness, impact, and practicality, all of which are discussed in turn in detail below. 2.1 The Purpose of Testing Testing is divided into different types according to different purpose. There are mainly four types of test -- proficiency tests, achievement tests, diagnostic tests, and placement tests (Hughes, 2003: 11). However, a test can also be a combination of two or more types of test such as the test being analyzed in this essay, which is a combination of proficiency and achievement test. As Hughes (2003: 11) states, proficiency tests are designed to measure people’s ability in a language, regardless of any training they may have had in that language. Since it is to test whether one is proficient or not, the content of the test is not probably based on the content or 5 objectives of language courses that people taking the test would have. Proficiency tests test people on their command of the language for a particular purpose. They may also show whether candidates have reached a certain standard with respect to a set of specified abilities. However, the preparation of proficiency tests is not that easy. Test designers need to consider as objectively and seriously as possible about the instruction, items, structures, and other aspects of the test. Besides, test examiners need to be objective in scoring as well. The examiners are usually independent of teaching institutions, or randomly chosen from all the teaching institutions to make sure they can make fair comparisons between candidates from different institutions. Achievement tests are the ones that teachers are more likely to be involved in. In contrast to proficiency tests, achievement tests are directly related to language courses, their purpose being to establish how successful individual students, groups of students, or the courses themselves have been in achieving objectives (Hughes 2003: 12). 2.2 Test Usefulness The most important quality of a test is its usefulness, since the most important consideration in designing and developing a language test is the use for which the test is intended (Bachman & Palmer 1996: 17). The test usefulness includes six test qualities—reliability, construct validity, authenticity, interactiveness, impact, and practicality. These six test qualities all contribute to test usefulness, so that they should not be evaluated independently of each other. 2.2.1 Reliability Reliability is often defined as consistency of measurement in a test, which means reliability can be considered a function of the consistency of scores from one set of tests and test tasks to another (Bachman & Palmer 1996: 19). This can be presented as in Figure 1 when reliability is considered to be a function of consistencies across different sets of test task characteristics. Scores on test tasks with characteristics A Reliability Scores on test tasks with characteristics A’ Figure 1: Reliability (Bachman & Palmer 1996: 20) 6 In this figure, the double-headed arrow is used to indicate a correspondence between two sets of task characteristics (A and A’) which differ only in incidental ways. Consistency, in educational assessment, appears in three varieties—stability, alternate form, and internal consistency. Stability consistency refers to consistency of results among different testing occasions, in other words, consistency over time; alternate-form consistency is about consistency of results among two or more different forms of a test, which is same as equivalence; internal consistency is related to consistency in the way an assessment instrument’s items function (Popham 2002: 28). However, due to the differences in the exact content being assessed on the alternate forms, environmental variables such as fatigue or lighting, or student error in responding, no two tests will consistently produce identical results (Wells & Wollack 2003: 2). This is true regardless of how similar the two tests are. Even the same test administered to the same groups of students but on different occasions will result in different scores. Nevertheless, the students’ scores are expected to be similar. The more similar the scores are, the more reliable the test is said to be. Score Reliability According to Wells and Wollack (2003: 2), reliability provides a measure of the extent to which an examinee’s score reflects random measurement error. One of three factors causes measurement errors: (a) examinee-specific factors such as motivation, concentration, fatigue, boredom, momentary lapses of memory, carelessness in marking answers, and luck in guessing, (b) test-specific factors such as the specific set of questions selected for a test, ambiguous or tricky items, and poor directions, and (c) scoring-specific factors such as nonuniform scoring guidelines, carelessness, and counting or computational errors. (Wells & Wollack 2003: 2) These errors are random and their effect on a student’s test score is unpredictable. Sometimes they help students to write the right answer while other times they make students answer incorrectly. Therefore, it is desirable to use tests with good measures of reliability. Score reliability means if a particular candidate performs in exactly the same way on the two occasions, he would be given the same score on both occasions. In other words, any one scorer 7 would give the same score on the two occasions, and this would be the same score as would be given by any other scorer on either occasion (Hughes 2003: 43). When scoring requires no judgement, such as the multiple choices test, and could in principle or in practice be carried out by a computer, the test is said to be objective and consistent. Meanwhile, the scorer reliability coefficient is 1, which means the test would be given precisely the same scores for a particular set of candidates regardless by whom or when it happened to be administered. But when a degree of judgement is called for on the part of the scorer, as in the scoring of writing, perfect consistency is not to be expected (Hughes 2003: 43). If so, the scorer reliability coefficient falls below 1. 2.2.2 Validity According to Messick (1993), validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment. In short, validity is the extent to which the instrument measures what it means to measure. Three Types of Validity Evidence There are three types of validity evidence—content related, criterion related, and construct related. Content-related evidence of validity refers to the extent to which an assessment procedure adequately represents the content of the assessment domain being sampled; criterionrelated evidence of validity is about the degree to which performance on an assessment procedure accurately predicts a student’s performance on an external criterion; construct-related evidence of validity refers to the extent to which empirical evidence confirms that an inferred construct exits and that a given assessment procedure is measuring the inferred construct accurately (Popham 2002: 52). As Hughes (2003: 26) argues, a test is said to have content validity if its content constitutes a representative sample of the language skills, structures, etc. with which it is meant to be concerned. A specification of the skills or structures, etc. that it is meant to cover is needed for judging whether a test has content validity or not. The greater a test’s content validity is, the more likely it is to be an accurate measure of what it is supposed to measure. In addition, a test with low content validity can have a maleficial backwash effect since areas that are not tested are likely to become areas ignored in teaching and learning. Face validity is a component of content 8 validity. It is established when an individual reviewing the instrument gives the conclusion that it measures the characteristic or trait of interest (Miller 1985: 3). In other words, it looks as if it indeed measures what it is designed to measure. Criterion-related validity relates to the degree to which results on the test agree with those provided by some independent and highly dependable assessment of the candidate’s ability (Hughes 2003: 27). This kind of evidence helps educators decide how much confidence can be placed in a score-based inference about a student’s status with respect to an assessment domain. Construct validity has been increasingly used to refer to the general, overarching notion of validity in recent years. It pertains to the meaningfulness and appropriateness of the interpretations that we make based on test scores (Bachman & Palmer 1996: 21). Based on the meaning of a construct in an educational assessment, construct validity is used to refer to the extent to which we can interpret a given score as an indicator of the abilities or constructs we want to measure. We can interpret construct validity as in Figure 2. SCORE INTERPRETATION: Inferences about Domain language ability of (construct definition) generalization C o n s t r u c t TEST SCORE V a l i d i t y A u t h e n t i c i t y Interactiveness Language ability Characteristics of the test task Figure 2: Construct validity of score interpretations (Bachman & Palmer 1996: 22) 9 Construct validity also has something to do with the specific domain of generalization, construct definition, characteristics of the test task and test taker’s areas of language ability. If a test is to have validity, not only the items but also the way in which the responses are scored must be valid. As Bachman and Palmer (1996: 33) states, if a test is meant to measure more than one ability, it makes the measurement of the one ability in question less accurate. Evidence like content relevance and coverage, concurrent criterion relatedness, and predictive utility can be provided for a particular score interpretation, as part of the validation process. However, valid is just the degree of measurement, and test validation is an on-going process and the interpretations we make of test scores can never be considered absolutely valid (Bachman & Palmer 1996: 22). How to Make Tests More Valid In the development of a high stakes test, such as University Entrance Examination, which may have significant effect on candidates’ lives, there is an obligation to carry out a valid exercise before the test is taken in operation. Since full validation is unlikely to be possible, as stated above, there are several recommendations from Hughes (2003: 33-34): First, write explicit specifications for the test which take account of all that is known about the constructs that are to be measured. Make sure that you include a representative sample of the content of these in the test. Second, whenever feasible, use direct testing. If for some reason it is decided that indirect testing is necessary, reference should be made to the research literature to confirm that measurement of the relevant underlying constructs has been demonstrated using the testing techniques that are to be employed. Third, make sure that the scoring of response relates directly to what is being tested. Finally, do everything possible to make the test reliable. If a test is not reliable, it cannot be valid. (Hughes 2003: 33-34) With great efforts, the validity of test can be moved on to a higher level of standard. Therefore, the test will be more useful for both candidates and examiners. The Relationship between Reliability and Validity Reliability and validity are critical for tests, and are sometimes referred to as essential measurement qualities since the primary purpose of a language test is to provide a measure that can be interpreted as an indicator of an individual’s language ability. They are two closely related 10 ideas and researchers have many theories about their relationship. Hughes (2003: 50) and Bachman and Palmer (1996: 23) suggest that, in order to be valid, a test must provide consistently accurate measurements, which means reliability is a necessary condition for validity, and hence for usefulness. However, reliability is not a sufficient condition for validity. In other words, a reliable test may not be valid at all. For some other researchers, test validity is requisite to test reliability. If a test is not valid, then reliability is meaningless (OPTISM n.d.). That means if a test is not valid, there is no point in discussing reliability since test validity is required before reliability can be considered in any meaningful way. It is the same the other way round. If a test is not reliable, it is also not valid (OPTISM n.d.). Figure 3 can explain the relationship between reliability and validity clearly: Figure 3: The Relationship between Reliability and Validity (Research Methods Knowledge Base 2006) The center of the target is the concept that examiners are trying to measure. Every shot represent one candidate that is to be measured. If you hit the center of the target, you measure the concept perfectly for a candidate. Otherwise, you do not. The more you are off for that person, the further you are from the centre. In Situation 1, you are consistently hitting the target, but off the center of the target. That means you are consistently measuring the wrong value for all respondents. In this case, the measure is reliable, but it is not valid. In Situation 2, you are randomly hitting the target, so the hits are spread disorderly. You seldom hit the center of the target, but on average, you are getting the right answer for the group. Under this circumstance, the test is valid but not consistent. Situation 3 shows that the hits are not randomly spread. Moreover, you consistently miss the center. The measure in this case is neither reliable nor valid. In the last situation, you consistently hit the center of target. In this case, the measure is both reliable and valid. 11 In order to exert the usefulness of a test, both reliability and validity are essential parts to which test designers should make great efforts. 2.2.3 Authenticity When making inferences about test takers’ language ability, the inferences are supposed to generalize to those specific domains in which the test takers are likely to need to use language, in other words, in a target language use domain. Bachman and Palmer define a target language use (TLU) domain as “a set of specific language use tasks that the test taker is likely to encounter outside of the test itself, and to which we want our inferences about language ability to generalize” (1996: 44). The TLU domain is an essential element of the usefulness of a test. In order to justify the usefulness of language tests, it is important to demonstrate that performance on language tests corresponds to language use in specific domains other than the test itself. Authenticity is just to measure the extent of one aspect of demonstrating, the correspondence between the characteristics of TLU tasks and those of the test task. Authenticity is defined as “the degree of correspondence of the characteristics of a given language test task to the features of a TLU task” (Bachman & Palmer 1996: 23). A TLU task here refers to an activity that individuals are involved in, using the target language for achieving a particular goal or objective in a particular situation. A more vivid explanation of authenticity is shown in Figure 4. Characteristics of the TLU task Authenticity Characteristics of the test task Figure 4: Authenticity (Bachman & Palmer 1996: 23) For example, if a test examines communicative ability, authenticity here refers to the degree of correspondence of the characteristics of the test task to the features of the communication task. If the given test construct closely resembles the situation a test-taker would face in the TLU domain, the test is more authentic. In other words, the test task is likely to be enacted in the real world. In a test, evidence of authenticity may be presented in the following ways: The language in the test is as natural as possible. 12 Items are contextualized rather than isolated. Topics are meaningful (relevant, interesting) for the learner. Some thematic organization to items is provided, such as through a story line or episode. Tasks represent, or closely approximate, real-world tasks. (Brown 2004: 28) (1) The language used in the test should be as natural as possible because test takers read instructions and items through the language, whether it is their first language or target language. Test designers should avoid using language with academic or technical terms. (2) In the real world, we seldom use single items; rather, we use items in phrases or sentences. Thus, items in the test should be contextualized rather than isolated. (3) Authentic means the test task is also used in the daily learning process, which means the topics of the test should be closely related to test takers’ daily life. The topics should be meaningful, relevant, or interesting for the test takers. (4) In test tasks like cloze, there are always paragraphs of a story and test takers are required to fill in blanks. However, if some thematic organization to items is not provided, test takers would have no idea about what the plot of the story is, and they would not be able to finish the task. Therefore, thematic organization of items should be provided in the test. (5) This evidence is the most important for authenticity in a test that the test tasks should be close to real-world tasks. In attempting to design a test task with authenticity, the test designer should first identify the critical features that define tasks in the TLU domain. This recognition serves as a framework for the task characteristics. Test tasks that have these critical features are then designed and selected. A language test is said to be authentic when it mirrors as exactly as possible the real life non-test language tasks. Testing authenticity can be divided into 3 categories, which are input (material) authenticity, task authenticity, and layout authenticity. Input authenticity means that authenticity should be present in the test material, and it further falls into three aspects, situation authenticity, content authenticity, and language authenticity. Task authenticity forms the cornerstone of test authenticity. In authentic tasks, the emphasis should primarily be on the proficiency levels of the population. The layout of the test paper should also be authentic. According to Bo (2007: 5), the 13 most usual way to make authentic layout of test paper is by presenting pictures. Vivid pictures can be used to test those productive skills as speaking and writing. 2.2.4 Interactiveness Interactiveness is another important element in the quality of usefulness. Bachman and Palmer (1996: 25) define interactiveness as “the extent and type of involvement of the test taker’s individual characteristics in accomplishing a test task”. Individual characteristics, such as test taker’s language ability (language knowledge and strategic competence 1 , or metacognitive strategies), topical knowledge, and affective schemata2, are most relevant for language testing. These can be shown as in Figure 5. Topical knowledge LANGUAGE ABILITY (Language knowledge, Metacognitive strategies) Affective schemata Characteristics of language test task Figure 5: Interactiveness (Bachman & Palmer 1996: 26) There are interactions between language ability, topical knowledge and affective schemata, and the characteristics of language test task. Authenticity refers to the characteristics of test tasks and features of TLU tasks, while interactiveness refers to the interaction between the test taker and the test task. Many types of test tasks may involve the test taker in a high level of interaction with 1 Bachman and Palmer (1996: 70) conceive strategic competence as “a set of metacognitive components, or strategies, which can be thought of as higher order executive processes that provide a cognitive management function in language use, as well as in other cognitive activities”. In other words, strategic competence, or metacognitive components provides an essential basis for designing and developing test tasks and for evaluating the interactiveness of the test tasks. 2 Affective schemata can be considered as the affective or emotional correlates of topical knowledge (Bachman & Palmer 1996: 65). Students’ affective schemata can influence their performance on tasks when they deal with emotionally charged topics, such as abortion, gun control, or national sovereignty. 14 the test input, such as responding to visual, non-verbal information. However, test taker’s language ability cannot be defined based on his performance in the test unless this interaction requires the use of language knowledge. Therefore, interactiveness is a critical quality of language test tasks since it is closely related to construct validity. The Common Ground of Authenticity and Interactiveness and Their Relationship with Construct Validity According to Bachman and Palmer(1996: 28-29), there are some points that authenticity and interactiveness share with each other in designing, developing, and using language tests. Firstly, since both authenticity and interactiveness measure extent and degree, we can just say relatively more or relatively less authentic or interactive, rather than authentic and inauthentic, or interactive and non-interactive. Secondly, when we talk about authenticity and interactiveness, we must consider three aspects of characteristics: characteristics of the test takers, characteristics of the TLU task, and characteristics of the test task. Thirdly, certain test tasks are relatively useful for their purpose even with low authenticity or interactiveness. Fourthly, our understanding of a test task’s authenticity and interactiveness is just a guess since different test takers have different characteristics that they perform differently in the same test. Fifthly, the lowest acceptable levels that we specify for authenticity and interactiveness depend on the specific testing situation, and they must be balanced with those for the other test qualities. As Bachman and Palmer (1996: 29) suggest, “[a]uthenticity, interactiveness, and construct validity all depend upon how we define the construct ‘language ability’ for a given test situation”. Authenticity is about the correspondence of test task and TLU task, so it is of course closely related to content validity. Moreover, authenticity provides a means for investigating the extent to which score interpretations generalize beyond performance on the test. Since investigating the generalizability of score interpretations is an important part of construct validity, authenticity and construct validity are linked. According to Figure 2, both interactiveness and construct validity have something to do with language ability, which includes language knowledge, strategic competence, metacognitive strategies, and furthermore, the topical knowledge. The degree of how interactiveness corresponds to construct validity depends on how we define the construct and on the characteristics of the test takers. 15 2.2.5 Impact Another quality of tests is their impact on society and educational systems as well as upon the individuals within those systems. A test is to serve a specific purpose, thus the test scores also imply values and goals, and they have consequences. As Bachman (1990: 279) points out, “tests are not developed and used in a value-free psychometric test-tube; they are virtually always intended to serve the needs of an educational system or of society at large”. Thus, whenever we use tests, our choices have specific impact on both the individuals and the system involved. There are two levels of the impact of test use. At a micro level, it is the individuals that are affected by the particular test use. At a macro level, the educational system and the society are affected by the particular test use. Washback When we deal with the impact of tests, one aspect should be mentioned first. Bachman and Palmer name it as “washback” (1996: 30) while Hughes calls it as “backwash” (2003: 1). This concept pertains to the effect of testing on teaching and learning, and it can be maleficial or beneficial. If in a test, the test designer asks candidates to write a composition to test their oral ability, the test would bring maleficial washback. It is because writing a composition is actually testing writing ability, although oral ability is a comprehensive ability that may also include writing ability, it will give the candidates the impression that the skill of speaking can be ignored in the classroom learning. Therefore, the test itself also cannot reach its purpose. “Cram” courses and “teaching to the test” are examples that maleficial washback bring to the classroom (Brown 2004: 29). On the other hand, beneficial washback or positive washback “depends in part upon factors such as the importance of the test, the status of the language being tested, and the purpose and format of the test” (Weigle 2002: 54). The test itself cannot ensure beneficial washback in consideration of the possibility that many factors outside the test may affect washback, like teacher’s personal beliefs, institutional requirements, and student expectations. Impact on Test Takers Test takers are among those individuals who are most directly affected by test use. According to Bachman and Palmer (1996: 31), mainly three aspects of the testing procedure affect test takers: 16 the experience of taking and, in some cases, of preparing for the test, the feedback they receive about their performance on the test, and the decisions that may be made about them on the basis of their test scores. (Bachman & Palmer 1996: 31) Firstly, the experiences of preparing for and taking the test have the potential possibility of affecting characteristics of test takers including personal characteristics, the topical knowledge, affective schemata, and their language ability. In high-stakes tests such as national examinations or standardized tests, test takers may spend several weeks or even months preparing for the test. Some high-stakes nation-wide public examinations are used as placement tests or proficiency tests, just as the test being analyzed in this essay, which is used for selection and labelling of different test takers within different levels. In these examinations, teaching may be focused on the syllabus of the test for up to several years before the actual test, and the techniques in the test will be practiced in class. Moreover, the experience of taking the test also has impact on test takers. The test taker’s perception of the TLU domain, areas of language knowledge, and use of strategies may be affected by the test. Secondly, the feedback that test takers receive about their performance in the test is likely to affect them directly. Therefore, feedback should be as relevant, complete, and meaningful to the test taker as possible. In most situations, the feedback of test performance is a score. However, in order to make beneficial impact on test takers, rich verbal description of the score, the actual test tasks, and the test taker’s performance are also needed. Finally, the decisions that may be made about the test takers based on their test scores may directly affect them in various ways. In low-stakes tests, the result of test will help students to discover their areas of strength and weakness so they know what more has to be done. On the other hand, in the examinations like University Entrance Examination, the result will directly determine whether a student can be admitted to a university or not. Some proficiency tests related to job hunting will also determine whether one can be employed or not. All these decisions have serious consequences for test takers. Therefore, fair decisions, which are with equally appropriate, regardless of individual test takers’ group membership, should be made. Fair test use also pertains to the relevance and appropriateness of the test score to the decision, as well as whether 17 and by what means test takers are fully informed about how the decision will be made and whether decisions are actually made in the way described to them. Impact on Teachers Test users are the second group of individuals who are directly affected by tests, including test designers, test examiners, and administrators. In an instructional program, the test users that are most directly affected by test use are teachers. Impact on the program of instruction is considered as washback for test users. Most teachers are familiar with testing influence on their instruction. The term ‘teaching to the test’ is unavoidable for most situations for teachers. It implies “doing something in teaching that may not be compatible with teachers’ own values and goals, or with the values and goals of the instructional program” (Bachman & Palmer 1996: 33). If teachers feel that what they teach is not relevant to the test, it must be an instance of low-test authenticity, in which the test has maleficial washback on instruction. Therefore, a useful test should be provided to minimize the potential for negative impact on instruction. Impact on Society and Education Systems In addition to test takers and test users, the society and education systems are also influenced by the impact of tests. In a second or foreign language testing, the consideration of values and goals is especially complex since the values and goals that inform test use may vary from one culture to another. Different cultures have different aspects to be valued. According to Shohamy (1998: 332), some tests reflect the social condition while some other tests reflect the political condition. Values and goals also change over time. Secrecy and access to information, privacy and confidentiality were never considered once, but they are now considered as basic rights of test takers. High-stakes tests, which are used to make major decisions about large numbers of individuals, are particularly likely to have consequences not only for the individual stake holders, but also for the educational system and society. An achievement test may have potential impact on the language teaching practice and language programs. A test with intended purpose may also have impact on the society. As Shohamy (1998: 332) argues, “[…] the act of testing is not neutral. Rather, it is both a product and an agent of cultural, social, political, educational and ideological agendas that shape the lives of individual participants, teachers and learners”. Tests like TOEFL 18 and IELTS are used to screening students applying for studying in English speaking countries as well as individuals applying for immigration. In this way, language tests have wider social and political implications as well. 2.2.6 Practicality Practicality is very different from the qualities mentioned above which pertains to the ways in which the test will be implemented and whether it will be developed and used at all, rather than the uses that are made of test scores. Bachman and Palmer (1996: 36) define practicality as “the relationship between the resources that will be required in the design, development, and use of the test and the resources that will be available for these activities”. This relationship can be represented as in Figure 6. Practicality = Available resources Required resources If practicality ≥1, the test development and use is practical. If practicality ≤1 , the test development and use is not practical. Figure 6: Practicality (Bachman & Palmer, 1996: 36) For any given situation, if the required resources for implementing the test exceed the available resources, the test will be impractical. The test designer should either reduce required resources or increase available resources to make the test more practical. In reverse, the test is practical. There are three general types of resources to assess practicality, human resources, material resources, and time. Human resources include test writers, scorers or raters, test administrators, and clerical support. Material resources refer to space (for example, rooms for test development and test administration), equipment (for example, typewriters, word processors, tape and video recorders, computers), and material (for example, paper, pictures, library resources). Time is composed of development time (time from the beginning of the test development process to the reporting of scores from the first operational administration) and time for specific tasks (such as designing, writing, administering, scoring, analyzing). The specific resources required will vary 19 from one situation to another, thus, practicality can only be determined for a specific testing situation. Brown (2004: 19) defines a practical test as one that “is not excessively expensive, stays within appropriate time constraints, is relatively easy to administer, and has a scoring/evaluation procedure that is specific and time-efficient”. High-stakes tests require a great deal of resources, so they are considered costly and time-consuming. However, it is impossible to spend too much money on a test, or ask test takers to spend hours on finishing the test, or require examiners to take hours to evaluate a test paper, or only computer can score the test while the test is taken far away from the nearest computer. Therefore, it is not surprising that “some test users may search for ways to avoid less practical tests if they believe other tests can serve the same purpose” (Gennaro 2006: 2). In other words, the value and quality of a test sometimes depends on quite detailed, practical considerations. 3. Analysis and Discussion The analysis mainly consists of two parts, the system of the test and its impact, and the test usefulness. There are three aspects to be discussed within the system of the test, which are the organization of the test, the implementation of the test, and the reform of the test. In the reform part, there is a comparison between the old TEM-8 and the new TEM-8. The test usefulness part pertains to reliability, validity, authenticity, interactiveness, impact and practicality. Since this essay emphasizes the impact, the impact is analyzed and discussed as a separate part. 3.1 The System of the Test and Its Impact When referring to the system of a test, we are talking about who makes the test, who organizes the test, how the test is organized, what is the purpose of the test, what is the content of the test, how the test is scored, and some other important issues when we deal with testing. In this essay, the system of the test is divided into three parts, the organization of the test, the implementation of the test, and the reform of the test. The impact of the system on students is discussed as well when analyzing the system. 20 3.1.1 The Organization of the Test As mentioned in the Introduction, TEM-8 was set up by the State Education Commission in 1991, and has been organized by the Higher Education Institution Foreign Language Major Teaching Supervisory Committee since then. Every year, the English group members of the supervisory committee design the test and then send the test to colleges and universities all around the country. However, the Higher Education Institution Foreign Language Major Teaching Supervisory Committee has no official web site. They just publish The Syllabus of Test for English Majors Band 8 (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2005), and inform of the time of testing in that year. There is no official way to make the test content public, only after scoring, some examiners will take pictures of the test and put them on internet. There are no official answers to the test as well, and this is one of the aspects that students complain about, since they have no idea where they have made mistakes. One of the students who took questionnaires says, “there are so many questions we are not sure of the answers”. Fortunately, English experts will do the test themselves and publish their answers on internet, and generally speaking, different experts have the same answers to objective parts of the test like multiple choices. For the more subjective parts, there are some slight differences, but as a whole, they are similar to each other, at least with the standard or scoring. Both teachers and students accept the experts’ answers. 3.1.2 The Implementation of the Test Every year, after the English group members of the supervisory committee design the test, the test papers are sent to colleges and universities all around the country. Every higher education institution is in charge of the test process within their own colleges or universities. They arrange the classrooms and supervisors, check up the identifications of the test takers, supervise the using of equipments in the classrooms, distribute as well as collect test papers, and send the test papers back to the supervisory committee. 21 The Participants of the Test The main participants of TEM-8 are Grade Four English major students of higher education institutions that are confirmed by the Ministry of Education. In the meantime, those non-English major students who have passed CET-6 (College English Test Band 6) can also attend the TEM-8. However, this number of students is very small. CET is one of the most pervasive English tests in China, which is set for all college students, no matter they are English major or not. Therefore, CET-4 and CET-6 are relatively much easier for English majors. The difficulty of CET-6 is probably equal to TEM-4, thus TEM-8 is much more difficult than CET-6. This is why there are so few non-English major students taking TEM-8. For all students who have taken TEM-8, they have one chance of taking the re-sit examination. Those who fail in the first year can attend the next year’s TEM-8 testing. The Time of the Test The time of the test brings much inconvenience to the students. 60%3 of the English majors that have answered my questionnaire state that the test taken in Semester 8 is not suitable. Although the test means to measure the implementation of the syllabus after the four years of teaching, most of the 60% English majors indicate that it is better to set the test in Semester 7. There are two reasons for this indication. Firstly, during the fourth year of college or university, students are preparing for the examination for further education, working on their graduation paper, and hunting for jobs, especially in Semester 8 when the Spring Festival is over. They have so many things to do in the last year and semester. With Question 15 in my questionnaire, 14% of the English major students show that the test influences their preparation for the graduation paper, and 30% of them indicate that the test influences their hunting for a job. This actually affects their ordinary learning and life. Secondly, the result of the test comes out in late May or early June, and only by then, students who have passed the test can get their certifications. However, many jobs need this certification when they are applied for. Therefore, students may lose many chances of good jobs. One more thing is that, if a student fails the test, it means that if he really wants to pass the test, he has to take the next year’s test again, but by that time, he may be 3 The data of questionnaire is set in Appendix 1. Numbers in percentage in the analysis usually refer to the percentage of students who have taken my questionnaire, unless I mention other groups specifically. 22 working already, studying abroad or just be far away from the college. It is inconvenient or even unpractical to attend the re-sit examination. The Test Framework The test contains six parts, i.e. Listening Comprehension, Reading Comprehension, General Knowledge, Proofreading and Error Correction, Translation, and Writing. Among these six parts, Listening Comprehension and Translation have subsections. Various testing techniques are adopted, such as multiple choice questions, and gap filling. As far as score is concerned, the Listening Comprehension, Reading Comprehension, Translation, and Writing take up to 20% of the total score respectively, while General Knowledge and Proofreading and Error Correction account for 10% respectively (see Table 1). Table 1: The framework of TEM-8 part I Format Number of items Percentage of scoring Mini-lecture Gap Filling 10 10% Interview Multiple Choice 5 5% News Broadcast Multiple Choice 5 5% Test item Listening Comprehension Time (min.) 35 II Reading Comprehension Multiple Choice 20 20% 30 III General Knowledge Multiple Choice 10 10% 10 IV Proofreading and Error Correction Gap Filling 10 10% 15 Translation 1 10% Chinese to English V Translation 60 English to Chinese VI Total Writing Translation 1 10% Passage Writing 1 20% 45 63 100% 195 23 In terms of scoring, the objective parts, including Interview, News Broadcast, Reading Comprehension, and General Knowledge, take up 40% score of the whole test, while subjective parts, such as Mini-Lecture, Proofreading and Error Correction, Translation, and Writing take up 60% score of the whole test. Therefore, there is a lot of writing the students should do. It is obvious that the students have to do more thinking than in a test with 90% objective items. The test begins at 8:15 a.m. in the early morning, and lasts for more than 3 hours. Although there are so many aspects to be tested, the long time of intensive work makes students exhausted. It is hard to imagine what the scene will be if an oral part is included in the test as well in the future. There is another thing that should be mentioned here. TEM-8 is not like other ordinary tests where the test papers are handed out at the beginning of the test and collected all together at the end of the test. In contrast, the test papers are handed out and collected in many steps: (a) The test papers of section I (without the Mini-Lecture part), II, III, IV and the answer sheets are handed out. (b) After listening to the Mini-Lecture, the test papers of this part are handed out but answer sheet 1 for this part are collected after 10 minutes. (c) After the time for the section III (General Knowledge) has run out, in other words, after 75 minutes, the answer cards for these four parts are collected. (d) After the answer cards are collected, for each section left, the test papers and answer sheets for that part are handed out at the beginning of that part and are collected at the end of that part. In a word, the test papers and answer sheets are handed out and collected five times. Apparently, the supervisory committee sets the test process like this to make sure there is less possibility for students to cheat, and also to control the students’ time for each section. However, this complex process brings much trouble for students. Firstly, they may feel stressed because of these procedures. In an ordinary test, students get all the test papers and answer sheets at the beginning of the test, and hand them in together at the end of the test. Therefore, they can arrange the time for different sections in their own way. For example, 24 in this test, if without these procedures, students who use less time of General Knowledge can spend more time on Reading Comprehension. However, in this test, students have to answer different sections in respectively precise times. Therefore, they feel different towards different sections (see Figure 7). 1 00 % 90 % 80 % 70 % 60 % 50 % 40 % 30 % 20 % 10 % 0 % N oten o ug h J us ti nt im e E no u gh W ha t ev e r L. C . R . C . G. K . P .& E .C . T . W . Figure 7: The Feeling of Time limitation in Different Sections (L.C.—Listening Comprehension; R.C.—Reading Comprehension; G.K.—General Knowledge; P. & E.C.—Proofreading and Error Correction; T.—Translation; W.—Writing) For Listening Comprehension, 54% of the students think they were just in time. For Reading Comprehension, 74% of the students think they did not have enough time for this part. For General Knowledge, 70% of the students think they had enough time for this part. For both Proofreading and Error Correction, 34% of the students think they did not have enough time or were just in time to finish this section. For Translation, 42% of the students feel they were just in time to finish this section while 44% of them feel they had enough time for this section. For the last section, Writing, 48% of the students feel they were just in time for this section. Overall, students feel stressed about the limitation of the time for each section. On the other hand, since the supervisors have to hand out, and collect test papers and answer sheets, they have to walk around the classroom, and this brings stress to students as well. The atmosphere can be very intense, and that is one of the reasons why some students feel extremely stressed in the classrooms when taking the test. 25 Secondly, most students perform better without being disturbed. However, in this test, they are disturbed many times. The fluency of thinking is cut off, and this may have a great influence on the performance of students, thus to affect their scores of the test. The Requirement of the Test In The Syllabus of Test for English Majors Band 8 (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2005: 2-4), there is a list of requirements for each section of the test. The requirements are very specific and detailed. For each section, the syllabus brings the level of knowledge students should have acquired, as well as the level of language using ability (see Appendix 3). Listening Comprehension From the requirement of the listening part, we can see that the syllabus has a high standard for English major students with their listening ability. Although it means to test students’ listening ability, it requires knowledge of all aspects as well, such as politics, economy, history, culture and education, and so on. Since the syllabus particularly points out foreign media, such as VOA, BBC, and CNN, material from these media have become an important resource for listening class in general teaching. Listening Comprehension consists of three parts: mini-lecture, interviews, and news broadcast. Among these three tasks, the mini-lecture is considered to be the most difficult task. 12% of the students who took the questionnaire directly point out that this part is very difficult. Students just get a piece of paper with no characters on it before the recorder runs. They have to write down what they have heard as soon as possible and as much as possible. They can just hear the lecture once, and after that, they will get a test paper and do gap filling. However, not the whole lecture is on the paper, just the brief version. 10 blanks are given to be filled. Many students, including me, are mind blank after listening to the lecture since we have no idea about the topic of the lecture before listening to it, and in addition, the speed of the lecture is almost the same as in a lecture in an English-speaking country. The 10 blanks are supposed to be related with the main clue of the lecture. However, the lecture is approximately 900 words long, thus it is difficult for students to write down the main clue when they hear the passages for the first time. 26 However, in spite of its level of difficulty, 80% of the students think that the Listening Comprehension section can really reflect their listening ability. Reading Comprehension As mentioned above, 74% of the students feel that they did not have enough time for the Reading Comprehension. Time limitation is of course a reason. There are about 700 words in each text (four texts), so considering the time limit, students are supposed to read the content with the speed of around 150 words one minute. In addition, they have to think about the answers as well. Sometimes they cannot directly give the answer but hesitate and think longer. Furthermore, without enough time is not only because of the time limitation, but also because of the level of difficulty of this section. According to the requirement, candidates not only need to understand the general idea of the article, but also need to be able to analyze details. Referring to the content of the articles, they are in large scale. Politics, economy, history, culture, education, science are all included. In Question 3, about the difficulty of the whole test, 10% of the students directly indicate that Reading Comprehension is difficult and needs a large scale of background knowledge. In addition, in Question 8, about whether they are familiar with the topics in the test, 48% of the students chose no. Furthermore, since this section is to test students’ reading ability, a great amount of vocabulary is required. Obviously, students with larger vocabulary can read the texts more quickly and understand the content much better. Therefore, in order to perform better in this section, students should do much practice at other times. General Knowledge This section is the only section that candidates can prepare for with particular material rather than just practicing for training the abilities. Since the content of this section is described in the requirement, all what students should do is to look for all the material and remember them in mind. However, referring to the material, since the scale of the content is still very large, students complain that there are too many things to be remembered, and it is very easy to get confused with similar countries, or literary works. Therefore, just 10% of the students think that this section can really reflect their language ability. Nevertheless, the general knowledge is still a very important part of language knowledge that students should acquire. 27 Proofreading and Error Correction The short paragraph consists of about 250 words, and within each line the test paper indicated, there is an error. Students are required to correct the errors by deleting a word, changing a word, or adding a word. Although there are 15 minutes for 10 items, students still think they need more time. 34% of the students feel that there was not enough time to for them to finish this section, and another 34% of the students express that they were just in time to finish this section. This is mainly because of the difficulty of this section rather than just the time limitation. In this section, the students are tested on a great amount of linguistic knowledge, for example, the structure of the sentences and paragraph, the vocabulary, and the lexical chunks. Students are required to be good at linguistics, be able to connect items within the context, and be sensitive to errors. This section is also closely related to students’ reading ability. Translation In this section, 42% of the students indicate that they were just in time to finish this part, and 44% of the students feel that there was enough time for them to finish this section. Although the time limitation is not so strict, the level of difficulty of this section is not lower than that of other sections. In this section, students need a great amount of background knowledge, such as politics, economy, history, culture and science. Furthermore, knowing the background knowledge is not enough, the students should be familiar with the English expression or Chinese expression for the knowledge. This is especially important for some titles of events, departments, and meetings. Without these expressions, the translation process will not be smooth. This section is to test candidates’ reading ability as well as writing ability. Students are supposed to be very familiar with both Chinese language and English language knowledge, know the similarities and differences between the two languages well, and be capable of translating either language to the other. Writing 24% of the students express that they were just in time to finish this part. They have to follow the instruction and take time to think about how to arrange the essay. If they arrange the time properly, they should be able to finish in time. In this section, the purpose is to assess students’ writing ability. However, writing ability is a comprehensive ability, including abilities of reading, 28 understanding, and writing. Students should have certain background knowledge4, imagination, and logic. This section can reflect students’ level of acquired language knowledge, their ability to understand the knowledge, as well as their ability to use the knowledge. Therefore, it is an integrative test. 35% of the students point out that they think this section can reflect their language ability. The Score of the Test The scores of the test are divided into four categories. The total score of the test is 100, and the ones below 60 are failed. The four categories are distinguished as in Table 2. Table 2: The Four Categories of the Score of the Test Scores Labels 80-1005 Distinction 70-79 Merit 60-69 Pass 06-69 Fail Every year, the national pass rate of TEM-8 is between 40% and 50%. Some better colleges and universities can reach above 80%. Therefore, these colleges take the high pass rate as their advantage when they enrol freshmen after the National College Entrance Examination. 4 The ‘background knowledge’ here refers to schematic knowledge, which includes relative knowledge, memory and experience, and provides relative material when we interpret new information. 5 Actually, the score of 100 is an ideal situation, since there are subjective parts in this test, even the students are all correct with the objective parts, there is little possibility that they can be perfect at subjective parts as well. In addition, the marking of the subjective parts stops students getting full score too. In Translation and Writing, the students are required to use appropriate choice of words, and have a variety in sentence patterns. The structure should be logically organized and relatively few significant errors of vocabulary, spelling, and punctuation should be found (Enfamily 2006). Therefore, and in fact, no one ever has the full score. 6 The score of 0 is also an ideal situation as in footnote 3. There is probably only one possibility that the students can get 0 score. That is when the student misses the test. 29 The result of the test comes out in late May or early June. However, as mentioned above, unlike CET-4 and 6, TEM-8 has no official web site. Students cannot track their scores on line. Every year, after the Higher Education Institution Foreign Language Teaching Supervisory Committee English Group finish scoring, the results will be mailed to every college and university as well as the certifications. Every college and university is in charge of informing students of their results and gives them the certifications. 3.1.3 The Reform of the Test There are some changes between the two syllabuses, mainly focusing on the content of the test (see Table 3). As a whole, there are four main changes between the old syllabus and the new syllabus. Table 3: The Changes between the Old Syllabus and the New Syllabus Sections The old syllabus Section A Talk (5 items) Section B Conversation (5 items) Section C News Broadcast (5 items) Listening Section D Note-taking & Gap-Filling Comprehension (10 items) Time: 40 minutes Percentage of score: 25% The new syllabus Section A Mini-lecture (10 items) Section B Conversation (5 items) Section C News Broadcast (5 items) Section A Careful Reading (15 items) Section B Speed Reading (10 items) Reading Time: 40 minutes Comprehension Percentage of score: 25% Reading Comprehension (20 items) Not included General knowledge Proofreading and Error Correction Translation Writing 10 language disorders Time: 15 minutes Percentage of score: 10% Section A Chinese to English Section B English to Chinese Time: 60 minutes Percentage of score: 20% Controlled composition writing (300 words) Time: 60 minutes Percentage of score: 20% Time: 35 minutes Percentage of score: 20% Time: 30 minutes Percentage of score: 20% 10 items about English knowledge Time: 10 minutes Percentage of score: 10% The same The same Controlled composition writing (400 words) Time: 45 minutes Percentage of score: 20% 30 Sections The old syllabus The new syllabus Percentage of objective and subjective scores Objective: 40% Subjective: 60% Time for test 215 minutes (divided into morning 195 minutes (just morning section) section and afternoon section) The order of the sections Listening Comprehension → Proofreading & Error Correction→ Reading Comprehension → Translation → Writing Objective: 40% Subjective: 60% Listening Comprehension→ Reading Comprehension→ General Knowledge→ Proofreading & Error Correction→ Translation→ Writing (a) The quantity of vocabulary that students should have acquired in the old syllabus is above 10,000 words, while that in the new syllabus is above 13000 words, which means the requirement for students has risen to a higher standard. Students have to work harder in and after classes. (b) The total time for the test is reduced. In the old syllabus, the total time is 215 minutes, and the test is divided into two parts, the morning part and the afternoon part. In the morning part, students are tested with Listening Comprehension, Proofreading and Error Correction, and Reading Comprehension sections, and in the afternoon, they are tested with Translation and Writing. However, in the new syllabus, the total time is reduced to 195 minutes, and all sections are supposed to be finished together in the morning. The total number of the items is not changed, which means in the new syllabus, students have to finish the same number of items within less 20 minutes’ time. The difficulty level of the test has risen to a higher stage. (c) The new syllabus has more requirements for the sections of Listening Comprehension, Reading Comprehension, and Writing. For Listening Comprehension, in the old syllabus, the proportion of selected response (multiple choice) is larger than that of limited response (gap filling), while in the new syllabus, the two proportions are the same, which means students have to commit themselves harder to the test, and answer with more or larger scale of language knowledge. For Reading Comprehension, in the old syllabus, Careful Reading tests students’ ability of reading carefully, and Speed Reading tests students’ ability to read fast, while in the new syllabus, there are just comprehensive tests. The purpose of this is to reflect more truthfully 31 the students’ psychology of reading and encourage students using various ways to read according to different needs in the process of reading. For Writing, in the old syllabus, students have to write 300 words in 60 minutes, while in the new syllabus students have to write 400 words in just 45 minutes. Time is reduced while the quantity is enlarged. This requires students to acquire better writing ability to deal with this section. (d) The new syllabus adds a new section, General Knowledge. According to the Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000: 16), English major should have three aspects of courses, English major skill courses, English major knowledge courses, and relative majors knowledge courses. Among them, English major knowledge courses refer to courses which are related to English language, literature and culture, such as English Linguistics, English Lexicology, English and American literature, English-speaking Countries’ Society and Culture. General Knowledge is added to test this aspect of knowledge. Therefore, the test can assess English major senior students’ professional qualities in a more comprehensive way. For students, this means they have more things to prepare. They need to pay attention to relative knowledge during their daily learning and keep the knowledge in mind. As a whole, compared with the old syllabus, the new syllabus brings forward more and higher requirements for the students. In order to pass the test, students should work harder on all aspects of English knowledge, and of course, they have to do more practice to strengthen their ability so as to get a better score. Conversely, with higher requirement, the professional level of the students will also be higher. 3.2 The Test Usefulness After the discussion and analysis of the system of TEM-8, this essay uses more theoretical ways to analyze and discuss the usefulness of TEM-8 in order to find out the impact of the test on both micro level and macro level. Before its usefulness, the purpose of TEM-8 should be discussed. As mentioned above, TEM-8 is set up to assess and evaluate the actual performance of Higher Education Institution English 32 Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000) for senior English major students. In this way, the test should be considered as an achievement test. However, the test sets a certain standard to measure students’ language ability as well. The content of the test has little to do with the content of classroom teaching, as the test measures abilities and methods rather than the content. Furthermore, for those English majors who get the certification of TEM-8, it is easier to get the jobs requiring the certification. In this way, the test should be considered as a proficiency test. 3.2.1 Reliability As Bachman and Palmer (1996: 135) indicated, “probably the most important consideration in setting a minimum acceptable level of reliability is the purposes for which the test is intended”. Therefore, as a relatively high-stakes test, the test designers of TEM-8 mean to set the minimum acceptable level of reliability very high. To measure the degree of reliability in TEM-8, two components of test reliability should be paid attention to, the performance of candidates from occasion to occasion, and the reliability of the scoring. For the old TEM-8 syllabus, according to Lan and Huang’s (2004: 2) data, from 1997 to 2003, the national pass rates are respectively 55.6%, 53.6%, 54.2%, 55.7%, 58.2%, 59.6%, and 51.8%7. The rates are in a certain range. For the new TEM-8 syllabus, the national pass rate is relatively consistent as well. The Figure 8 shows the national pass rate of TEM-8 in the latest five years (from 20058 to 2009). 7 The enrolment expansion of college and university students began in 1999, therefore, in 2003, the first batch students were in Grade 4 and attended the test. The larger number of students and their different levels of language ability influenced the national pass rate. 8 The national pass rate in 2005 is much higher than other years. Experts explain that since 2005 was the first year the new syllabus for TEM-8 taken into action, students had to adapt themselves to the new item types, therefore, the level of difficulty was a little lower than predetermined. However, in the following years, the level of difficulty has been raised to the standard level, therefore, the national pass rate dropped a little. On the other hand, the enrolment expansion of college and university students since 1999 also affects the result since different students have different levels of language ability. 33 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% TheNation alPass Rat eofTE M-8 20 05 2 006 2007 2008 2009 Figure 8: The National Pass Rate of TEM-8 The rate dropped a little in 2006, 2007, and 2008, however, in 2009, the rate rose up. These five years’ rates are around 45% to 50%. It seems that candidates’ performance in different years is relatively consistent, at least in latest five years. As a whole, the test is relatively high stability consistent, in other words, consistent over time. Both with the old syllabus and with the new syllabus, the reliability estimates are well within the desirable range and substantial. There are two more considerations that need to be kept in mind when setting high minimum acceptable level of reliability – “the way the construct has been defined and the nature of the test tasks” (Bachman & Palmer 1996: 135). That means only when the test focuses on a relatively narrow range of language abilities with relatively uniform test tasks, could the test achieve higher levels of reliability. In the Writing section of the TEM-8 test, a controlled composition is adopted (see Figure 9, the Writing section of the test paper of TEM-8 in 2009) Figure 9: 2009 TEM-8 Test Paper Writing Section (2009 TEM-8 Test Paper 2009) Mandarin, or Putonghua, is the standard service sector language in our country. But recently, employees at a big city’s subway station have been busy learning dialects of other parts of the country. Proponents say that using dialects in the subway is a way to provide better service. But opponents think that encouraging the use of dialects in public counters the national policy to promote Putonghua, what is your opinion? Write an essay of about 400 words on the following topic: Are Dialects Just as Acceptable in Public Places? 34 In the first part of your essay you should state clearly your main argument, and in the second part you should support your argument with appropriate details. In the last part you should bring what you have written to a natural conclusion or make a summary. Marks will be awarded for content, organization, grammar and appropriateness. Failure to follow the above instructions may result in a loss of marks. Write you essay on ANSWER SHEET FOUR. As shown in Table 6, the topic of expected composition is clearly explained. Furthermore, the test paper clearly specifies the composition’s outline and marks instructions, including content, organization, grammar and appropriateness. In addition, the test paper also points out the scoring points of this section. Controlled composition seems to be beneficial to the improvement of scoring consistency. Clear instructions can also contribute to the reliability of a test. In my questionnaire, the Question 2 is about whether the candidates think the instructions are clear. 90% of the candidates show that they think the instructions are clear. For example, in the Writing section mentioned above, students are informed of what the topical knowledge is about, how to deal with this section, and how the examiner will score this section. To take another example, in the Proofreading and Error Correction section, the test paper directly shows the candidates how to use the three ways of correcting in a right way (see Figure 10). Figure 10: The Instruction for Proofreading and Error Correction (2009 TEM-8 Test Paper 2009) Proofread the given passage on ANSWER SHEET TWO as instructed. When ∧art museum wants a new exhibit, (1)____an______ it never buys things in finished form and hangs (2)___forms____ them on the wall. When a natural history museum wants an exhibition, it often build i. (3)____often_____ 35 This instruction is well against vagueness. With the instruction, students will not lose points because of using the wrong forms. On the other hand, being familiar with the format of the test helps test reliable as well. Question 1 in the questionnaire is about whether the candidates are familiar with the format of the test paper. 92% of the students indicate that they are familiar with the format since they have done a lot of practice and many model and previous tests before taking the actual test. The scoring method is another matter that affects the test reliability. In TEM-8, the objective part takes up 40% of the total score while the subjective part takes up 60%. The objective part apparently has fixed answers to the items, and this part is done on an Answer Card, which is scored by a computer. Therefore, the score of the objective part is reliable. The subjective part is not as reliable as the objective part. However, the degree of reliability of this part is never low. There are four sections involved in the subjective part. The first section is Mini-Lecture in Listening Comprehension. The second section is Proofreading and Error Correction. These two parts both have fixed answers, thus they are highly reliable. The third section is Translation and the fourth section is Writing. For both sections, there are two certain grade score descriptions on the internet (Enfamily 2006). The scores are divided into five categories, each with specific descriptions of standard. As a whole, the score reliability is relatively high. The four parts discussed above indicate that TEM-8 has consistency from occasion to occasion, controlled range of tested ability, clear instruction and familiar format, and relatively high score reliability. We can make the conclusion that TEM-8 has relatively high reliability. 3.2.2 Validity To measure the extent of validity of TEM-8, three aspects of evidences should be analyzed – content validity, construct validity, and score validity. As mentioned above, a specification of the skills or structures, etc. that a test is meant to cover is needed for judging whether it has content validity or not. In this essay, the specification of TEM8 is the combination of the framework and the requirements of this test. This essay has analyzed in detail these two parts in 3.1.2 Implementation section. In the framework of TEM-8, there are 36 specific details about the structure of the test, and in the requirements of TEM-8, detailed information about required skills is provided. TEM-8 is to test students’ listening ability, reading ability, as well as writing ability. Different sections of the test can assess different abilities among the three. Listening Comprehension tests all the three abilities since the dictation is to test listening ability, the questions and Mini-Lecture passage are to test reading ability, and the gapfilling in Mini-Lecture is to test writing ability. Reading Comprehension and General Knowledge both test reading ability. Proofreading and Error Correction, Translation, and Writing all test both reading ability and writing ability. Therefore, TEM-8 is relatively content valid. When discussing construct validity, the construct definitions should be analyzed first. There are two kinds of construct definitions, syllabus-based construct definitions and theory-based construct definitions. TEM-8 belongs to syllabus-based construct definitions. The Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000) is specifically set for English major students, and TEM-8 is set relying on this teaching syllabus. The syllabus is useful when we need to obtain detailed information on students’ mastery of specific areas of language ability. There is a sample interpretation of construct validity in the Writing Section as shown in Table 4. Table 4: The Interpretation of Construct Validity in Writing Section9 (Zhou, 2003) The Requirement in Teaching Syllabus Be able to use the basic translation theories to be preliminarily familiar with the comparison of English and Chinese languages, and acquire common translation techniques; Further with the comparison of English and Chinese languages, acquire the theories and techniques in English-Chinese and Chinese-English translations. Testing Ability Testing Points Translation Ability 1. Knowing the different ways and approaches to translation according to different topics, styles, occasions, situations and audience; 2. Expressing their ideas eloquently and expressively; 3. Conveying the information faithfully; 4. Understanding the differences between literal translation and meaningful translation; 5. Knowing the basic techniques in translation; and 6. Understanding the differences in usage, word order, grammatical structure and rhetorical devices between English and the mother tongue. 9 The syllabus used in this table is the old syllabus. However, it can still well interpret the construct validity in TEM8. 37 Since TEM-8 tests multiple abilities in the test tasks, it is unlikely that the measurement of any abilities of the three can be accurate. However, TEM-8 is a test assessing students’ integrative ability, and the score reliability is relatively high, therefore, the inaccurate measurement of a single ability will not affect the validity of the whole test too much. Question 5 in the questionnaire is about whether the candidates think the result of TEM-8 can reflect their language ability or not. 12% of the students chose “absolutely”, the majority, 64% of the students chose “maybe”, 18% of them chose “no” and 6% of them chose “I don’t know”. From this data, we can find that the majority of the students indicate that the result TEM-8 can probably reflect their language ability. Of course, the test scores can never be considered absolutely valid since valid is just a degree of measurement and the test validation is an on-going process. Question 6 in the questionnaire is about which sections the candidates think can measure their English language ability. 80% students chose Listening Comprehension, 64% students chose Reading Comprehension, 10% students chose General Knowledge, 24% students chose Proofreading and Error Correction, 90% students chose Translation, and 70% students chose Writing. It seems the content, construct and score of Listening Comprehension, Reading Comprehension, Translation, and Writing are considered reflecting a great degree of students’ language ability, especially with the Translation section. In combination with the analysis earlier, we may have the conclusion that TEM-8 has great validity. 3.2.3 Authenticity There are three categories in testing authenticity, input (material) authenticity, task authenticity, and layout authenticity. Within input authenticity, it further falls into three aspects, situation authenticity, content authenticity, and language authenticity. In this essay, the focus is on language authenticity and task authenticity. All English major students receive education in English, including their learning material, learning syllabus, homework, classroom teaching, as well as tests. Although they are not native speakers, in the classroom, the teacher teaches in English, explains in English, and the students interactive in English. All the instructions are in English. It seems in their learning process, except for the translation part, Chinese language is never their language of instruction. Therefore, 38 the whole test paper in English, questions as well as introductions, seems to be very authentic when related to English majors’ real learning life. The tasks of TEM-8 have with great authenticity as well. One can relate all the sections in TEM8 to real life. For Listening Comprehension, all the three parts of the task can be found in the real study process. In the academic world, classroom lectures by professors are very common for a non-native English student. Students always write down the main point or the key points of the lectures. This is the same in Mini-Lecture part. The Mini-Lecture part lets students hear the lecture once, and requires the students to take notes. After that, an answer sheet with the main structure of the lecture is handed out. The students are supposed to fill in some important information to complete the structure. The Interview and News Broadcast can also be found easily in real life. When we listen to a piece of interview or news, we hope we can find the main point of it. For Reading Comprehension, since everyone reads books, newspapers, articles and some other material, they are of the same function. The material provides information, and the readers look for the information they want from the material. For Proofreading and Error Correction, it is the most authentic since everyone can make mistakes in writing and they have to be corrected. For General Knowledge, since there are courses about English-speaking countries’ general situation, English literature, American literature, and linguistic knowledge, this section is closely related to the real teaching syllabus. For Translation, all non-native English learners are familiar with the translation between English and their mother language. Furthermore, there are also translation courses according to the Teaching Syllabus. For the final section— Writing, it is also very close to the real learning process. Students are always required to write an essay after reading a book, listening to a lecture, and watching a movie. This is a way they can express their ideas and thoughts in paper, only in English. In the questionnaire, Question 7 is about whether the candidates are familiar with the test items and techniques in the test, and 78% students chose “yes”. The test designers design the test according to the Teaching Syllabus, and at the same time, all our courses and learning material are also designed according to the syllabus, as well as the classroom activities. To sum up, TEM8 has relatively high authenticity. 39 3.2.4 Interactiveness To measure the extent and type of involvement of the test taker’s individual characteristics in accomplishing a test task, the relationship between input and response should be analyzed. Brown and Hudson (1998: 653) divide the language assessment methods into three types— selected-response assessment, constructed-response assessment, and personal-response assessment. Davies (1990:38-39) provides a framework of discrete point—integrative, in which the characteristic of assessment is determined by four elements— input, nature of task, nature of response, and scoring. Bachman and Palmer (1996: 55) classify the test task into three types— reciprocal tasks, non-reciprocal tasks, and adaptive tasks. Combining the three ways of categorizing testing methods or tasks, here comes the framework of testing methods of TEM-8. Table 5: The Framework of Testing Methods of TEM-8 The Title of Task Nature of Nature of Nature of Input Task Response Integrative Constructed Listening Comprehension A Integrative (Mini-Lecture) Scoring Discrete point Interactiveness Reciprocal Listening Comprehension B,C (Interview Integrative Discrete point Selected Integrative Discrete point Selected Integrative Discrete point Selected and News Discrete Non- point reciprocal Discrete Non- point reciprocal Discrete Non- point reciprocal Broadcast) Reading Comprehension General Knowledge Proofreading and Discrete Integrative Integrative Constructed Translation Integrative Integrative Constructed Integrative Reciprocal Writing Integrative Integrative Constructed Integrative Reciprocal Error Correction point Reciprocal 40 All these sections are designed with integrative input. Except for Translation and Writing, all other sections are discrete point scoring. Objective parts are discrete point tasks with nonreciprocal interactiveness that requiring selected response, while subjective parts are integrative tasks with reciprocal interactiveness that requiring constructed response. In this table, we can find that most testing methods are integrative tasks, which can ensure relatively high interactiveness. The characteristics of language test task consist of three aspects, language ability (including language knowledge, metacognitive strategies), topical knowledge, and affective schemata. Since the test takers have similar language ability and affective schemata, this section only focuses on the topical knowledge. Some test tasks that require certain topical knowledge may be easier for those who have that knowledge and more difficult for those who do not. However, for most English major students, they have equal knowledge of subjects outside of their own, such as politics, science, economy, culture and tradition. In the TEM-8 of 2009, the topics of four Reading Comprehension texts are travelling abroad, the baby boom and its impact, working wives, and hill climbing. They have covered politics, science, culture, tradition, and society. 52% of the students show that they are familiar with the topics in the test, while the rest of the students show they are not. This shows many students still lack large-scale topical knowledge. The Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000: 4) has pointed out that the students are supposed to be familiar with working in department of foreign affairs, education, economy, culture, science, and military. It also points out that the students should acquire large scale of knowledge and certain relative major knowledge. It is not only beneficial for better using English in working, but also important to develop integrative students. Therefore, there are certain relative courses set up according to the Teaching Syllabus. Therefore, the topical knowledge of TEM-8 is relatively interactive as well. 3.2.5 Practicality TEM-8 is high in practicality. It is set up by the Ministry of Education, and the Higher Education Institution Foreign Language Teaching Supervisory Committee English Group is responsible for it both academically and organizationally. The group is composed of professors and experts from several top colleges and universities in the country. 41 With the use of computers to score the objective parts in an Answer Card, the examiners have removed certain burdens, and this method well avoids certain man-made mistakes, such as failing to see the wrong answers. It saves time, human resource and money. 3.3 Impact Tem-8 is to assess and evaluate the actual performance of Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000) for senior students. Meanwhile, it assesses the teaching quality as well as the students’ language ability. As a high-stakes test, it has certain impact on the test takers, the teachers, and the society and educational systems. 3.3.1 Impact on Test Takers As mentioned earlier, there are three aspects of testing procedure that affect test takers—the experience of taking and preparing for the test, the feedback about their performance on the test, and the decisions made by the results of the test. The Experience of Taking and Preparing for the Test TEM-8 is a high-stakes nation-wide public examination, used as an achievement test as well as a proficient test to select and label different test takers within different levels. Preparing for the test costs students a lot of time, money, and energy. According to the data of questionnaire (Question 9), 80% of the students took 0-3 months to prepare this test, and the remaining 20% of the students spent 4-6 months on it. Students have to spend several weeks or even months to prepare such high-standard examination. One of the students points out that, in fact, students have been preparing for the test since they begin learning the English language. TEM-8 is to assess students’ language ability, however, the language ability is accumulated day after day, year after year. From the time of preparation for the test, the importance of TEM-8 is apparent. However, the preparation time and the time taking the test affect students’ daily life. As mentioned earlier, 60% of the students state that the test taken in Semester 8 is not suitable. Although it is taken in the beginning of Semester 8, students already have a lot of things to do which are closely related to their future life in Semester 7. In 42 Question 15, 14% of the students think the preparation for TEM-8 influences the completion of their graduation paper (the first whole draft of the graduation paper is supposed to be handed in at the end of Semester 7). 30% of the students think the taking time of TEM-8 influence their hunting for jobs since many good jobs require the certification of TEM-8 when applying, however the late outcome stops students from getting the jobs as early as possible. 24% of the students, most of who are preparing for the Graduate Record Examination (taken in late January) in Semester 7, think the preparation for TEM-8 influence their English language learning. This puts the students under a certain degree of stress. In addition to taking classes, students prepare the test themselves. They always buy material for preparation. As the Supervisory Committee has no official web site, they have no recommended material for students to prepare the test. What students can do is walk into the bookstore and buy some books or material they think may be helpful. There are six sections in the test, thus many publishers publish books or material for each section respectively. According to the data of the questionnaire (Question 10), 4% of the students did not buy any material for respective sections. They just bought model tests and previous tests as practice. 42% of the students bought material for one or two sections, 34% of them bought for three or four sections, and 20% of them bought for five or six sections for preparation. The material is not cheap at all. If a student buys material for all the six sections, adding model tests or previous tests, he will spend hundreds of RMB. After students buy material for preparation, they have to spend time on the material. From the framework of TEM-8, we can see that in a controlled situation with limited time, students have to spend more than three hours on a complete test paper. It is not hard to imagine how much time the students would spend on practice. In addition, to increase the pass rate of TEM-8 and to comply with many students’ request, some colleges and universities set up courses directly related with TEM-8. However, according to the questionnaire (Question 11), only 32% of the students think the courses are useful, and 56% of the students think the courses are not that useful. Some of the students who consider the courses useful state that the courses help them a lot, especially with Listening Comprehension and General Knowledge. Teachers can teach them better technique to do Listening Comprehension, and provide the most important material for General Knowledge so that students do not have to waste their time to find all relative material. 43 For those who think the courses are not that useful, some of them state that the courses just provide more practice rather than instructions, and some others state that the courses are not enough and the time is limited. Although not everyone is required to attend these courses, students who attend these courses spend more energy than those do not on these courses and the after-class homework. The Feedback on Test Taker’s Performance on the Test The feedback on test taker’s performance on the test is the score of the test in other words. In TEM-8, as mentioned earlier, the students get their scores and certifications from their school. There is a weakness that as mentioned earlier, there are no official answers of the test, therefore the students cannot get to know where they have made mistakes. It is not useful if students want to find out their strong points and weak points of language ability. Although only 64% of the students consider the test to reflect their language ability, experts design the test in this way for certain reasons. Since the test has high reliability, validity, authenticity, interactiveness and practicality, we may say that the test reflects students’ language ability to a great degree. Therefore, the feedback on the test taker’s performance on the test is relatively meaningful for the test takers. The Decisions Made on the Basis of the Result of the Test As a high-stakes test which is used as a proficient test, TEM-8 has great influence on students. Although the test in some ways has maleficial impact on students’ daily life, it has some beneficial impact on students as well. In Question 14, about in what aspects TEM-8 helps students, 28% students chose “testing language ability”, 54% chose “finding a job”, 20% chose “learning English language”, 30% chose “becoming more confident in English”. This is especially true with “finding a job”. As mentioned earlier, many good jobs need the certification of TEM-8 when being applied for. After having held the certification, the student adds one more strong point on his curriculum vitae. Since more than half of the test takers around the country cannot pass the examination, one having the certification has advantage when applying for a good job competing with one who does not. However, consequently, the students feel stressed because of this test. TEM-8 is the highest standard for English major students as well as non- English major students. For some students, 44 the certification of TEM-8 is as important as the diploma. Every English major student attend in the examination and wish to pass it. In the Question 13 about the stress students feel, 22% students feel much stressed to take this test, 64% students feel some stress about the test, 12% students feel little stressed, while only 2% students feel no stress with taking the test. There are several sources of the stress. Firstly, as mentioned in 3.1.2 The Implementation of the Test the framework of the test section, some stress is from the procedures of taking the test. Secondly, people always evaluate English majors’ language ability with the certification. Maybe the interviewer of a company who does not really know English thinks that if the interviewee has the certification, he must have high level of English ability and he must be better than those do not have. Therefore, it has a symbolic meaning for English majors. Thirdly, there is some stress from the family. Parents wish their children can pass the highest standard examination of their majors and the parents will be proud of them if they pass the examination. The three sources are all realistic, therefore, when I ask the candidates whether they will take the next year’s examination if they fail in the first year (Question 16), 76% of them chose “yes”. The certification is so important for English major students. 3.3.2 Impact on Teachers As a high-stakes test, teaching to the test is an unavoidable situation. As mentioned earlier, some colleges and universities set up courses directly related to TEM-8. Therefore, some teachers are asked to teach the courses. This essay has interviewed two teachers of a university in China to see teachers’ general opinion of the test as well as the impact of the test on teaching structure. The two teachers are respectively Teacher A and Teacher B. They are from the same university and teach the course directly related to TEM-8. The course is named “English Testing” which is set in Semester 7. The course directly tutors the six sections of the test, and since there are too many sections to prepare for just one teacher, two teachers are in charge of this course. For example, Teacher A is in charge of teaching Listening Comprehension, Reading Comprehension, and Writing, while Teacher B is in charge of teaching General Knowledge, Proofreading and Error Correction, and Translation. Teacher A has been teaching this course for three years while Teacher B has been teaching for four years. 45 The two teachers hold different points of view towards whether it is necessary to set up this course. Teacher A considers it necessary while Teacher B considers it unnecessary. However, both of them state that the practice in class helps students to be familiar with the format and instruction of the test, and students can be more skilful using techniques in different sections. For the sources of material used in class, both teachers state that, they find the material together with another teacher (there are altogether four teachers teaching this course with two classes) from a variety of sources like foreign translations, textbooks, and model and previous test paper. They choose the material mainly for the reason that those kinds of material are typical and may reflect students’ problem in each section. They are helpful for students to find the problems and thus to solve them. Regarding to the timetable of the course, the two teachers also have different opinions. In Semester 7, the course “English Testing” is given to students once a week, with each class lasting for one and a half hour, and each teacher takes half semester. However, the difference of opinions is related to the different sections that the two teachers are in charge of. Teacher A is in charge of Listening Comprehension, Reading Comprehension, and Writing. The reading comprehension and writing practice is handed out as homework at the end of last class. Therefore, the teacher just needs to do listening comprehension in the class and comment on students’ previous work on reading comprehension and writing. Teacher B is in charge of General Knowledge, Proofreading and Error Correction, and Translation. All the three parts are done in the class and commented immediately. Since there is much knowledge and many points of all the three sections should be referred to, an hour and a half of a class is enough for neither teacher nor students. Both teachers state that the students should prepare for the test for about three to six months. However, for some students who are studying English very hard all the time, one month may be enough as what they need is to know what to be tested and how to arrange the time probably. Referring to the validity of the test, both teachers show that they consider the test can really reflect students’ language ability. TEM-8 is a comprehensive test, and generally, the score is a good reflection of the students’ language efficiency. 46 The students consider TEM-8 as very important, and so do teachers, but only for students. Nowadays, it is one of the most important ways to tell the language efficiency of English majors. However, the test is of no importance to teachers. With good results, the teachers will not get reward, and with bad results, the teacher will not be punished either. In converse, the result may have something to do with the university. Good result like high pass rate and Distinction rate may help build up the fame of the university. 3.3.3 Impact on Society and Education Systems Since TEM-8 is the highest standard of test evaluating the language ability of English majors, most people take it as a measurement to see whether an English major student is proficient enough. This kind of thought brings maleficial impact on students that they may feel very stressed, and the thought is not suitable for judging a person’s proficiency just relying on a piece of paper as well. Some students may be good at taking test while some may be not. The test is not the only way to judge a person’s language ability level. For example, in a job interview, in addition to the TEM-8 certification, the interviewer should consider other aspects as well. From the school academic report, the interviewer can find out the student’s daily performance of study, whether he has made progress, whether he has worked hard enough, and what his strong points and weak points are. From the academic activity recording, the interviewer can see whether the student is good at language use. Furthermore, the interviewer can talk with the student directly, getting whatever information he needs to judge the student’s ability. The society should judge a person’s language proficiency with an integrative point of view. The test has impact on the education system as well. On one hand, as mentioned earlier, teaching to the test is an unavoidable situation since TEM-8 is a high-stakes test. In addition to the courses directly related to the test (such as “English Testing”), many other ordinary courses are approaching the test. For example, in course of Listening, news broadcast and interviews are the common material, however, in recent years, the course gradually adds the part of note-taking. Although the note taking may be not the same as the form of Mini-Lecture in TEM-8, they share similarities. In another course, Senior Writing, the teacher asks the students to write compositions according to the requirement of Writing in TEM-8. Although teaching to the test is beneficial for 47 students who are going to take the examination, we hope in the meantime, the courses can really help promote the students’ language ability. Furthermore, testing is a good way to evaluate the quality of teaching. From the test, experts can analyze the actual performance of Teaching Syllabus, and measure students’ language ability. In fact, the test and the Teaching Syllabus interact with each other. A good test can reflect the actual performance of the Teaching Syllabus, and in what aspects the Teaching Syllabus should be improved. Conversely, a better Teaching Syllabus can make better Testing Syllabus, thus to make better test to evaluate the Teaching Syllabus better. Based on the analysis of the test, experts can bring forward new requirement for students to push them onto a higher level of language ability, and new Teaching Syllabus for teachers to give better education to the students. 4. Conclusion TEM-8, Test for English Majors Band 8, is a test to assess and evaluate the actual performance of Higher Education Institution English Major English Teaching Syllabus (Higher Education Institution Foreign Language Teaching Supervisory Committee English Group 2000) for senior students. It measures the quality of teaching and students’ language ability as well. Test usefulness is a good way to measure whether the test is suitable. Consistency from occasion to occasion, controlled range of tested ability, clear instruction and familiar format, and relatively high score reliability indicate that TEM-8 has relatively high reliability. High content validity, construct validity, and score validity indicate that the test has great validity. High language authenticity and tasks authenticity indicate that the test is relatively authentic. Integrative tasks and familiar topical knowledge ensure relative interactiveness in the test. Being easy to be put into practice, and saving time, human resource and money indicate that the test is high in practicality. In this standard, apart from the impact part, we may say that TEM-8 is relatively suitable for English majors. The system of TEM-8 is clear and well organized. The test has the specific purpose, the implementation follows a strict procedure, the framework and requirements are very detailed, and the reform of the test is reasonable and meaningful. However, the system has many aspects of 48 impact on students, both maleficial and beneficial. With no official web site, students find it inconvenient to find some important information; the timetable of the test influences students’ daily life, including hunting for jobs, preparing for the graduation paper, and applying for further education; the testing procedure makes students feel stressed; the high requirements need students to improve their language ability into a higher level; and the reform of the test provides more and higher requirements for students. The maleficial impacts need to be considered by the committee and ameliorated in further study. The impact of the test on students is very great. From preparing for the test, to getting feedback of the performance of taking the test, and last receiving decisions by the result of the test, students get impact on their time, money, energy, reflection of language ability, future life. The test has beneficial as well as maleficial impact on students. With the interaction of the Teaching Syllabus and the test, the improvement of the Teaching Syllabus is supposed to make a better test. Therefore, the test can bring more impacts that are beneficial for students and reduce the maleficial impacts at the same time. In this way, the test can be used better to evaluate the performance of the Teaching Syllabus as well as students’ language ability. 49 References Bachman, L. (1990). Fundamental Considerations in Language Testing. Oxford: Oxford University Press. Bachman, L. & A. Palmer (1996) Language Testing in Practice. Oxford: Oxford University Press. Bo, J. (2007). An Analysis of Authenticity in CET-4 and TEM-8. Sino-US English Teaching. Vol. 4, No. 2, 2007, 28-33. Brown, H. (2004). Language Assessment: Principles and Classroom Practices. The United States of America: Pearson Education, Inc. Brown, J. & Hudson, T (1998). The Alternatives in Language Assessment. TESOL Quarterly. Vol. 32, No. 4, 1998, 653-675. Chang J., Zhang Y. & Wu Y. (2006). A Study of the First Implementation of 2004 Syllabus for TEM-8. Foreign Language and Their Teaching. Serial No. 208, 2006, No.7, 18-21. Davies, A. (1990). Principles of Language Testing. New Jersey: Basil Blackwell Lte. Enfamily (2006-06-29). TEM-8 Grade Score Description. Accessed May 10, 2010. <http://bbs.enfamily.cn/thread-10050-1-1.html > Gennaro, K. (2006). Fairness and Test Use: The Case of the SAT and Writing Placement for ESL Students. Teachers College, Columbia University Working Papers in TESOL & Applied Linguistics. Vol. 6, NO. 2, 2006. Accessed May 8, 2010. <http://journals.tc-library.org/templates/about/editable/pdf/Di%20Gennaro%20Forum.pdf> Higher Education Institution Foreign Language Teaching Supervisory Committee English Group (2000). 《高等学校英语专业英语教学大纲》[Higher Education Institution English Major English Teaching Syllabus]. Beijing: Foreign Language Teaching and Researching Press. Higher Education Institution Foreign Language Teaching Supervisory Committee English Group (2005). 《专八考试新大纲》[The Syllabus of Test for English Majors Band 8]. Accessed May 10, 2010. <http://www.hjenglish.com/subject/tem/page/14872/?page=1> Hughes, A. (2003). Testing for Language Teachers. Cambridge: Cambridge University Press. Lan Y. & Huang X. (2004). TEM-4 & TEM-8 Feedback on English Teaching and Problems Caused by Expanded Enrolment and the Corresponding Solutions. Guangxi Teachers College Journal (Philosophy, Social and Science Edition). 2004, 112-116. Li B., Wang A. & Wang X. (2007). Teaching Feedback Upon the Statistical Analysis of TEM-8 Results. Journal of Taizhou University. Vol. 29, No. 5, 2007,78-86. Messick, S. (1993). Validity. [In] Linn, R. (ed.) Educational Measurement. Phoenix: The Oryx Press. i Miller, M. (1985). Reliability and Validity. Western International University. Accessed May 5, 2010. <http://michaeljmillerphd.com/res600_lecturenotes/Reliability_and_Validity.pdf> OPTISM n.d.. Test Reliability and Validity Defined. Randy, L. Ohio. Accessed May 5, 2010. <http://cc.ysu.edu/~rlhoover/OPTISM/reliability_validity.html> Popham, W. (2002). Classroom Assessment: What Teachers need to know. Boston: Allyn & Bacon. Research Methods Knowledge Base (2006). Reliability & Validity. William M.K. Trochim. Accessed May 5, 2010. <http://www.socialresearchmethods.net/kb/relandval.php> Shohamy, E. (1998). Critical Language Testing and Beyond. Studies in Educational Evaluation. Vol. 24, No. 4, 1998. 331-345. Twenty-four English Website (2009-03-09). 2009 English TEM-8 Test Paper. Accessed May 3, 2010. <http://www.24en.com/tem/dynamic/2009-03-09/106418.html> Weigle, S. (2002). Assessing Writing. Cambridge: Cambridge University Press. Wells, C. & J. Wollack (2003) An Instructor’s Guide to Understanding Test Reliability. University of Wisconsin. Accessed May 4, 2010. <http://testing.wisc.edu/Reliability.pdf> Zhou, S. (2003). The Cohesion of Language Curriculum and Language Test—the Design and Implementation of TEM-8. Foreign Language World. No. 6, 2003 (General Serial No. 98), 71-78. ii Appendices Appendix 1: Questionnaire Questionnaire Hey everyone, I am working on a study about the impact of TEM-8 on English majors, and I know you have just gone through the test in March. So I will appreciate it if you can accomplish this questionnaire. Thank you for your time. 1. Are you familiar with the format of the test? If yes, how? If no, which section’s format of the test is it that you are not familiar with? A. yes B. no _____________________________________________________ 2. Do you think the instructions are clear? If not, in which sections? A. yes B. no _____________________________________________________ 3. Do you think the test is difficult? And why? A. yes B. no _____________________________________________________ 4. What do you think about the time limitation for each section? A. not enough B. just in time C. enough D. whatever Listening Comprehension _______________ Reading Comprehension _______________ General Knowledge _______________ Proofreading & Error Correction ______________ Translation _______________ Writing _______________ 5. Do you think the result of TEM-8 really reflects your language ability? Why? A. absolutely B. maybe C. no D. I don’t know _______________________________________________________ 6. Which part/parts do you think can measure your English ability? (you can choose more than one answer) A. Listening Comprehension B. Reading Comprehension C. General Knowledge E. Translation D. Proofreading & Error Correction F. Writing iii G. None 7. Do you think the test items and techniques (gap filling, multiple choices, writing) are related to how you practice English in class and after class? If no, why? A. yes B. no _______________________________________________ 8. Are you familiar with the topics used in the test? A. yes B. no 9. How many months earlier did you start to prepare TEM-8 before the test? A. 0-3 months B. 3-6 months C. 6-12 months D. more than 12 months 10. There are six sections in the test, how many sections did you buy material for preparation? A. 0 B. 1- 2 C. 3-4 D. 5-6 11. Do you think the courses related to TEM-8 are useful for your preparation? If it is useful, in what aspects are useful? If not, why? A. very useful B. useful C. not that useful D. useless _____________________________________________________ 12. Do you think the test taken in the eighth semester is suitable? And why? A. very suitable B. suitable C. not suitable D. I don’t mind when ____________________________________________________ 13. Do you find it stressful to take this test? And why? A. very much B. some C. little D. none _____________________________________________________ 14. In what aspects do you think TEM-8 helps you? (you can choose more than one answer) A. testing language ability B. finding a job C. learning English language D. becoming more confident in English E. nothing 15. In what aspects do you think preparing for TEM-8 influences your daily life? (you can choose more than one answer, and you can write your own answers as well) A. completing graduation paper B. hunting for job C. learning English language D. nothing Other __________________________________________________ 16. If you fail in this test, do you want to take it next year? And why? _______________________________________________ iv The Data of Questionnaire Percentage Item A B C D 1 92% 8% 2 90% 10% 3 88% 12% L 12% 54% 34% 0% R 74% 18% 6% 2% G 4% 24% 70% 2% P 34% 34% 30% 2% T 10% 42% 44% 4% R 14% 48% 36% 2% 5 12% 64% 18% 6% 6 80% 62% 10% 24% 7 78% 22% 8 52% 48% 9 80% 20% 0% 0% 10 4% 42% 34% 20% 11 6% 32% 56% 6% 12 2% 28% 60% 10% 13 22% 64% 12% 2% 14 28% 54% 20% 30% 15 14% 30% 24% 24% 4 16 E F G 90% 70% 0% 16% 76%(yes) 12%(no) 12%(maybe) v Appendix 2: Interview Questions: 1. How many years have you been teaching this course? 2. What sections of the test are you teaching? 3. Where do you get material for each section for students? And why do you choose them? 4. Do you think the timetable of this course is suitable? Why? 5. Do you think it is necessary to set up this course? Why? 6. For how long do you think the students should prepare for the test? 7. Do you think TEM-8 can really reflect students’ language ability? 8. Do you think TEM-8 is important? In what aspects? (for students/ teachers/ faculty) vi Appendix 3: Specifications for the TEM-8 (Excerpts) The requirement of the test Listening Comprehension (a) Students should be able to follow English dialogues and speeches in company. (b) Students should be able to follow special coverage about politics, economy, culture and education, and technology within foreign media, such as VOA, BBC, and CNN. (c) Students should be able to follow general lectures about politics, economy, history, culture and education, language and literature, and science, as well as the questions and answers after the lectures. Reading Comprehension (a) Students should be able to understand articles about social discussion, politics and book analysis in general English or American newspapers or magazines. They should know the gist and general idea of the article, as well as tell the facts and details within the article. (b) Students should be able to read common biographies and literary works, understanding the literary meaning as well as the implicit meaning. (c) Students should be able to analyze the concept, structure, language skill and rhetorical devices within those types of articles mentioned above. (d) Students should be able to adjust their own reading speed during the test. General Knowledge (a) Students should have a basic idea about main English-speaking countries’ geography, history, current situation, and cultural traditions. (b) Students should have acquired basic English literature knowledge. (c) Students should have acquired basic English linguistic knowledge. Proofreading and Error Correction The students should be able to identify speech disorders in a short paragraph with the language knowledge of grammar, rhetoric and sentence structure. In addition, they should be able to provide the correct alternative. vii Translation (a) The Chinese-to-English part requires students to be able to use the theory and techniques of Chinese-to-English to translate an argumentative essay, a narrative essay, and introductions about national conditions in China’s newspapers and magazines, as well as excerpts of common literary works. The reading speed for this part is about 250 to 300 characters per minute. In addition, the translation should be strictly true to the original version, and the language should be fluent. (b) The English-to-Chinese part requires students to be able to use the theory and techniques of English-to-Chinese to translate articles about politics, economy, history, culture and other aspects in English or American newspaper or magazines, as well as excerpts of original texts of literature. The reading speed for this part is about 250 to 300 words per minute. In addition, the translation should be strictly true to the original version, and the language should be fluent. Writing The students should follow the title and requirements given by the test paper and write an expository essay or argumentative essay with about 400 words. The essay should be written with fluent language, suitable words, reasonable style, and strong persuasion. viii
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
advertisement