Calhoun: The NPS Institutional Archive Theses and Dissertations Thesis Collection 1987 Analysis of intelligence and academic scores as a predictor of promotion rate for U.S. Army Noncommissioned Officers. Warner, Jerry B. http://hdl.handle.net/10945/22191 B^CHT^ NAVAL POSTGBAl^O NAVAL POSTGRADUATE SCHOOL Monterey, California THESIS ANALYSIS OF INTELLIGENCE AND ACADEMIC SCORES AS A PREDICTOR OF PROMOTION RATE FOR U. S. ARMY NONCOMMISSIONED OFFICERS | by Jerry B. Warner June 1987 Thesis Advisor: P. A. W. Lewis Approved for public release; distribution is unlLtiited T233780 SECu«i''>' Classification Of This PAGf REPORT DOCUMENTATION PAGE la SEPORT SECURITY CLASSIFICATION lb MARKINGS RESTRICTIVE Unclassified 2a SECURITY Classification AUTHORITY 2b declassification Approved for public release; distribution is unlimited /DOWNGRADING SCHEDULE PERFORMING ORGANISATION REPORT NUM8£R(S) 4 6b OFFICE SYMBOL (If spplKtble) Naval Postgraduate School 55 6< ADDRESS (Cry. Stitt 7a 7b Monterey, California 93943-5000 NAME OF Funding /SPONSORING ORGANIZATION 8c ADDRESS n (Gfy, Sf<f*,<nd NAME OF MONITORING ORGANIZATION Naval Postgraduate School *nd/lPCodt) 8a MONITORING ORGANISATION REPORT NUM3ER(S) S NAME OF PERFORMING ORGANIZATION 6a DISTRIBUTION/ AVAILABILITY OF REPORT 3 ADDRESS (Ofy. Ststt. »nd Zll> Code) Monterey, California 93943-5000 8b OFFICE SYMBOL (If tQphctbit) PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 9 Z/PCod#j SOURCE OF FUNDING NUMBERS 10 PROGRAM PROJECT TAS< WORK ELEMENT NO NO NO ACCESSION NO JNIT v.^a unciud, sec.nr, Cis^.f,c.t,on) ^f^^ygiS OF INTELLIGENCE AND ACADEMIC SCORES AS A PREDICTOR OF PROMOTION RATE FOR U.S. ARMY NONCOMMISSIONED OFFICERS. PERSONAL AuThOR(S) : 3 '>'t OF J WARNER, Jerry flfPiJ/'^ 1 Master's Thesis 3d T'ME B, COVERED DAl£ OF.REJ>ORT 14 FROM June TO (Ytit. Month Oiy) IS PAGf COoNT jir Supplementary notation 6 C^ '4,5UP4CT COSATi COOES ' GROUP ElO ABSTRACT (Continue on 9 Subgroup reverie if TF RMS rrytnt (Continue on o" reverie if "'^^rri^o^W"* Promotion ASVAB, ^JAFQT, ne<eu4rf ind identify by i ntteinry trtd identify bf block number) number) 6/<xit i_ T This thesis systematically and comprehensively analyzes available personnel data to determine if a significant relationship exists between measures of intelligence and academic performance, and career promotion Forty thousand Noncommissioned Officer rate for Noncomiaissioned Officers. this, using three approaches. determine (NCO) records were analyzed to procedure which progressed from sequential The first approach was a regression models. multivariate through analysis of individual variable scored in the top who of analysis NCO s The second approach focused on more advanced used approach third The three percent of promotion rate. and components principal of the use statistical techniques, including explanatory influential most to better identify the factor analysis, variables. (Continued) -1 ' lO D S'R'3UTiON/ AVAILABILITY OF ABSTRACT S^NCLASSiFiED/LiNL'MiTED SAME AS RPT D 2ii 21 O DTiC USERS 22b TELEPHONE SAME OF RESPONSIBLE NOiViDUAL P. A. W. 00 FORM MAR f(n</u<ye Are* Cod*) 408-646- 2283 Lewis 1473, 84 ABSTRACT SE.CyRlTY CLASSIFICATION Unclassified 83 APR edition All mjy be ujed until e«h«u»ted other editiont »(t obsolete 22c OFFice 5 SYMBOL 5Lw SECURITY CLASSIFICATION OF ThiS PAGE SECURITY CLASSIFICATION OF THIS PACE (Whmt Dim Bnfr*4> Block 19. (Continued) ABSTRACT During the analysis, eight measures of intelligence and academic ability were used as explanatory variables. Four control variables were included in the analysis to discriminate between subcategories of NCO's. They were: sex, career field, race, and paygrade. Throughout the analysis consideration of Army promotion and accession policy was included. Knowledge of these policies resulted in elimination of some special groups which had received promotions under significantly different conditions than the rest of the sample. An example of this was Reserve and National Guard members called to active duty. This study found that there was significnat statistical evidence to show that a high level of Armed Forces Qualification Test (AFQT) score and prior service academic accomplishment will correspond to a higher promotion rate. Also, in-service measures of NCO education and performance testing were good indicators of promotion rate. However, there was significant variance associated with the explanatory relationship. As a result, a useful predictive model could not be designed using regression methods. Although the model could predict promotion averages for major population subcategories, it was unreliable when used solely with the AFQT variable. The findings of this study suggest two policy recommendations. The first recommendation was a confirmation of the constraints placed on AFQT category and high school diploma status by the 1984 Defense Authorizations Act. The second recomiTiendation was to require promotion boards to consider NCO schooling level and performance test scores in their proceedings, but to avoid directly tying either score to promotion, in terms of a minimum quota or scaled promotion point scale. Finally, a suggestion was given for further research to investigate the underlying reasons for different attrition patterns observed among racial and ethnic groups. S N 0102- LF- 014- 6601 SeCUNITY CUAtllFICATlON OF THIS PAOEfWhtn Dmia Bnffd) Approved for public release: distribution unlimited Analysis of Intelligence and Academic Scores as a Predictor of Promotion Rate for U.S. Army Noncommissioned Officers by Jerry B. Warner Captain, United States Army B.S., United States Military Academy, 1976 Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN OPERATIONS RESEARCH from the NAVAL POSTGRADUATE SCHOOL June 1987 . ABSTRACT This thesis systematically and comprehensively analyzes available personnel data to determine if a significant relationship exists between measures of intelligence and promotion rate for career and performance, academic Forty thousand Noncommissioned Noncommissioned Officers. records were analyzed to determine this, using Officer (NCO) three approaches The first approach was a sequential procedure which progressed from analysis of individual variables through The second approach focused multivariate regression models. of NCO's who scored in the top three percent of on analysis The third approach used more advanced promotion rate. including the use of principal statistical techniques, identify the most components and factor analysis, to better influential explanatory variables. During the analysis, eight measures of intelligence and Four academic ability were used as explanatory variables. variables were included in the analysis to control discriminate between subcategories of NCO's. They were: sex, career field, race, and paygrade. Throughout the analysis consideration of Army promotion Knowledge of these and accession policy was included. policies resulted in elimination of some special groups which had received promotions significantly different under conditions than the rest of the sample. An example of this was Reserve and National Guard members called to active duty. This study found that there was significant statistical evidence to show that a high Armed Forces level of Qualification Test (AFQT) score and prior service academic accomplishment will correspond to a higher promotion rate. Also, in-service measures of NCO education and performance testing were good indicators of promotion rate. However, there was significant variance associated with the explanatory relationship. useful As a result, a predictive model could not be designed using regression methods Although the model could predict promotion averages for major population subcategories, it was unreliable when used solely with the AFQT variable. The findings two policy of this study suggest recommendations. The first recommendation was a confirmation of the constraints placed on AFQT category and high school diploma status by the 1984 Defense Authorizations Act. The second recommendation was to require promotion boards to consider NCO schooling level and performance test scores in their procedings, but to avoid directly tying either score to promotion, in terms of a minimum quota or scaled promotion point scale. Finally, a suggestion was given for further research to investigate the underlying reasons for different attrition patterns observed among racial and ethnic groups. . .. . TABLE OF CONTENTS Page I. INTRODUCTION A. BACKGROUND 11 B. PURPOSE 12 C. ORGANIZATION 12 D. PRELIMINARY INFORMATION 13 E II. III. IV. 11 1 Intelligence Test Scores 14 2 Academic Scores 17 3 Promotion Scores 17 4. Analytical Tools Used 19 SUMMARY 20 REVIEW OF PREVIOUS STUDIES 22 OVERVIEW OF THE DATA 29 A. INTRODUCTION 29 B. DESCRIPTION OF VARIABLES 30 C PREPARATION OF THE DATA 31 D. COMPARISON TO TOTAL ARMY STATISTICS 33 SUCCESSIVE DATA ANALYSIS 37 A. INTRODUCTION 37 B. UNIVARIATE ANALYSIS 38 C. BIVARIATE ANALYSIS 58 D. 1. Correlation Matrix 59 2. Paired Scatterplots and Simple Regression 65 3. 3-D Empirical Density Plots 70 MULTIVARIATE GRAPHICAL ANALYSIS 72 5 .. . E. . LINEAR MODELS 1 Analysis of Variance 74 2 ANCOVA 83 3. The Final Model: V. VII. Multiple Regression... 87 87 b. Results 89 c Interpetation 92 d. Checking Assumptions 95 e. Confirmation of Regression Findings... 96 f Testing the Model g. Summary of Regression Analysis . SUMMARY OF FINDINGS 98 100 101 ANALYSIS OF TOP PERFORMERS 102 A. INTRODUCTION 102 B. COMPARISON OF MEANS AND VARIANCE 103 C. SIGNIFICANCE TESTING 104 D. ANALYSIS OF DISTRIBUTIONS 106 E. VI. A Background a E. 74 SUMMARY OF FINDINGS 119 PRINCIPAL COMPONENTS AND FACTOR ANALYSIS Ill A. INTRODUCTION Ill B THEORY Ill C. RESULTS 113 D. SUMMARY OF FINDINGS 119 CONCLUSION 120 A. OVERALL FINDINGS 120 B. POLICY RECOMMENDATIONS 123 6 . C. SUGGESTIONS FOR FURTHER RESEARCH APPENDIX A CAREER MANAGEMENT FIELDS AND FREQUENCIES APPENDIX B AFQT TRANSFORMATION EQUIVALENT SCORES 124 . 126 127 LIST OF REFERENCES 128 INITIAL DISTRIBUTION LIST 130 7 . LIST OF TABLES I II. III. Summary of Variables in Sample 30 Total Army vs. Sample Summary Statistics 34 Comparison of PRA vs Standard Normal Percentiles .. 44 Sample Race Percentages 46 Sample Paygrade Percentages 46 Sample Mental Category Percentages 50 Sample Highest Year of Education Percentages 52 Sample Education Level Percentages 52 IX. Sample NCO Schooling Percentages 54 X. Pearson Correlation Coefficients 62 IV . V. VI . VII. VIII . XI . Most Significant Correlated Variables 64 XII . Simple Least Squares Summary Data 69 One-way ANOVA Summary 76 Seven-Way ANOVA with Interaction Summary 81 ANCOVA with Interaction Summary 85 Regression Results 91 Net Possible Change by Explanatory 93 Sensitivity of PRA to Explanatory Variables 94 Comparison of Regression Data Sets 96 Comparison of Extreme and Average Predictions 98 XIII. XIV. XV. XVI . XVII. XVIII. IXX. XX. XXI. XXI I . XXIII. XXIV. XXV. Comparison of Predicted vs Actual PRA Averages ... 109 Top vs Sample Summary Data 103 Top vs Sample Hypothesis Test Results 105 Principal Component Tabular Results 114 Reduced Principal Component Tabular Results 117 8 LIST OF FIGURES 3.1 Army versus Sample Paygrade Bar Chart 35 3.2 Army versus Sample Race Bar Chart 35 4.1 Raw Promotion Rate Histogram and Statistics 40 4.2 Variable RATE Histogram and Statistics 42 4.3 Variable PRA Histogram and Statistics 43 4.4 Variable CMF Histogram and Percentages 45 4.5 Variable GTSCR Histogram and Statistics 47 4.6 Variable AFQTP Histogram and Statistics 59 4.7 Variable OAFQTP Histogram and Statistics 50 4.8 Variable EIMCAT Bar Chart of Percentages 51 4.9 Variables EDLVL and HIYRED Cluster Bar Chart.... 53 4.10 Variable NCOE Bar Chart 55 4.11 Variable PQSCR Histogram and Statistics 56 4.12 Lowess Scatter Plot of OAFQTP versus PRA 67 4.13 Lowess Scatter Plot of HIYRED versus RATE 67 4.14 3-D Empirical Density Plot of OAFQTP by PAYGD...72 4.15 3-D Empirical Density Plot of OAFQTP by RACETH..72 4.16 Coded Scatter Plot of PRA versus CMF with SEX.. 73 X-Y Line Plot of ANOVA MEANS 78 4.18 Regression Residual Histogram 95 4.19 Regression Residual Scatter Plot 95 5.1 Cluster Bar Chart TOP vs Sample CMF Changes .... 107 5.2 Comparative Histograms of TOP vs Sample NCOE ..107 5.3 Comparative Histograms of TOP vs Sample PAYGD..108 4 . 17 6 . 1 Factor Plot 115 6 . 2 Factor Plot Reduced Variables 118 10 . I A. INTRODUCTION . BACKGROUND In almost any organization, one hopes that individuals at high levels of authority are gifted with higher intelligence. equal Correspondingly, one effort, a work: more than average would think that, given intelligent person will advance more rapidly than his contemporaries in an organization. difficult, not is It however, contradict our perceptions of career advancement. individual who the to find examples which role intelligence in of In almost any field one can remember an was not the most intellectually gifted, but through hard work and persistence, or other less quantifiable advanced equally or traits, measured better There ability. mental influences to overwhelm the value of in the eyes of superior. a ample room for other a person's intelligence An unattractive personality, the tasks an at hand, can discredit the merit of raw other flaws myriad of a of higher is intelligence to inability to apply that and persons than intelligence intelligence impacts The degree at which lies in and on advancement of complex interaction between individuals the area organizations. carries It with it much of the uncertainty of quantification of human performance. Despite general ample reward for room exceptions, for being more 11 the intelligent concept of still a seems It may be, reasonable. looking at manifestation requires as possible. large a numerically large number of people It is the task of this thesis fairly restricted, The population is one population. which has had fundamental raw statistics uniformly obtained/ promote personnel are unambiguous and policies to and where see its set of opportunities to investigate this relationship within a but clearly to similar a who have been affected by as for advancement that however, well documented. B. PURPOSE The purpose question: measures of Does of individual's a this thesis promotion academic and rate to answer a central relationship exist between significant intelligence is as a ability, and an Noncommissioned Officer? Put more simply, does being smarter, as measured by initial test scores, or being better schooled, indicate that a person will perform better and, hence, advance more quickly than his peers? The answer to for Army policies of this question has important implications recruitment, retention, and promotion. It is also a matter of general interest to social scientists. C. ORGANIZATION This thesis is organized fundamentally as a data analysis investigation. Chapters and I II provide preliminary information on the nature of the study variables, and briefly 12 review some related articles which have addressed this topic. The remaining chapters discuss the analysis forty-thousand Noncommissioned three related approaches. procedure standard Officer The first experimental of of approximately (NCO) approach data a fairly analysis. This procedure begins with analysis of fundamental individual advances increases variables, then dimensionality in approach views records using is attributes of successive through complexity. and The second subset of the population which distinguishes a itself by being in the top three percent of the NCO promotion rates. Comparison of these top performers to the remainder of the population identifies attributes which are found to be significantly associated different, cause and possibly are advancement. rapid for hence, In an third the approach, the statistical methods of principal components and factor analysis are used to provide an alternative critical variable selection, as method of well as to lend credibility to the results of the other two approaches. D. PRELIMINARY INFORMATION This section contains nature of data, the promotion system, and in this of thesis. looseness intelligence in and general a a initial an discussion of the Army NCO overview synopsis of the analytical tools used As previously mentioned, the about the effectiveness academic phenomena in Army promotion data, policy. 13 there is of a degree measurement for and also some confounding Early recognition of . should problems these set the degree of caution which is needed in reviewing the subsequent chapters of analysis. analytical tools is intended to inform the reader section on of conditions the The under which analysis data the was conducted, and the hardware and software used. 1 Intelligence Test Scores . General a. for intelligence test scores falls into The data the category sometimes referred to as Defined Measurement. Measurement Defined one is where the considered cannot be measured directly CRef . result, actual the property. a In this intellectual for surrounded by controversy. entire books topic of Army is the 6] As a :p. case, property the is particular battery of tests. The efficacy of intelligence measure being the presumed related measurements are test intelligence, and scores from 1 property is substituted for measurement of related measure a . A ability This as is a representative itself controversy and studies. Forces Armed tests Vocational an issue been the has The testing done by the Aptitude Battery, or ASVAB. Although not designed specifically as an intelligence test, the Additional ASVAB research does has predict shown verbal portions of the ASVAB have ACT, PSAT, and SAT that general the a high trainability mathematical and correlation to the college entrance examinations The ASVAB has been studied, . C Ref . 2] improved, and used for over forty 14 " : years. recent A Measurement and article by Jensen Evaluation in Counseling and Development [Ref 3:p. 35], in , states "To the degree that success in various occupations and training programs requires different levels of general ability (often called intelligence or IQ), an ASVAB composite (it hardly matters which one) will be as validly predictive as any test now on the market. It seems that the new ASVAB-14 is near the limit of refinement, psychometrically . . . Generally then, established aptitude is determine candidates, academic have shown themselves to well a Although the test. specifically attempt to potential ASVAB the documented and military does not intelligence the portions be reasonably of its of the ASVAB test defined measurements of intelligence. Specific Tests. b. The ASVAB consists of a battery of ten subtests. Composites of the subtests of the ASVAB are used to determine the overall acceptability enlistment, and for which intelligence are field This score the word he individual she or requesting would best be is the Armed Forces measures aggregate as is the GT, of or general intelligence aggregation of knowledge^ paragraph reasoning. considers taken The first intelligence. score. an From the entire battery of tests, two derived scores suited. of of three submodules, comprehension/ and arithmetic The second derived measure of intelligence is the Qualification Test four submodules, 15 Score, or AFQT word . knowledge, This score paragraph arithmetic comprehension, operations [Ref . reported as reasoning 10:sec 1-0, . An 1] p. numerical and AFQT score is percentile score representing the examinee's a relative standing in reference to a specific population. There has recently been some additional the reference population In October of 1984, the AFQT score. manipulation of for assignment of an individual's AFQT percentile was shifted from a base reference population of 1944 to that of 1980. base reference population is represent how youth population would be was originally until percentiles. Manpower prior Data Department of the Army the 1980 was (DMDC), listing test scores, have not been manipulated. the case with AFQT retake tests their score, to soldiers increase and AFQT effected by the all subsequent based on AFQT percentile the sum of the raw for expressed as which are base 1980 been computed transformations can be found in APPENDIX GT scores, of values test percentiles for of records have A the 1980 to Center reference. designed to This set utilized transformation A values and had not been updated 1944, in thesis soldiers who enlisted Defense distributed. designed This 1980. of scores of the entire American AFQT raw the set a A A. However, unlike the have been allowed to their original GT scores. Retesting was introduced in 1982 when a minimum GT score of 120 was enforced on eligibility for promotion to NCO rank. 16 . . 2 Academic Scores General a. The defined data used measurement, intelligence. academic for similar ability is also measures the to years value This the number of independent of the quality of is education, and the grades that any given individual through indicative of may have assumes that continued attendance and This study progression for Specifically, the property of academic ability is being represented by a simple assignment of received. a educational the academic ability. system For example, is inherently a high school graduate has more academic ability than an individual with an eighth grade education. The informational value of academic It is treated in used in the study: scores is thus, not as useful as desired. analysis as only an ordinal scaled variable. Specific b. Three present education Army, and scores academic education level, schooling individuals who is made entry into Because advanced available only to those service records, the military have superior education score carries with upon level since entry. military education professional are additional information it some relative to the performance of the NCO. 3 Promotion Scores Promotion within the Army is somewhat complicated procedure. 17 It a closely supervised and is the product of a . number considerable policies of applied across the population. within rank function of computation which Instead, structure, within years education. of of career individual's an not uniformly are they are applied field, or even as a although Thus, promotion the rate is an easy task, that value may have been influenced by several policies that were peculiar to the individual, General a. Promotion of NCO's is governed by Army Regulatic AR 600-200. This eligibility, and establishes regulation outlines process the requirements for of selection. system views the individual's performance as includes a composite score based on a whole. The This performance scores, commander's ratings, service awards, and review by a board of senior composite This NCO's. threshold value for the promoting individuals point value is used as Department of The management field, and as to use when next higher paygrade, as slots to the become available. the Army a slots are accounted for by career minimum threshold for a combat soldier to be promoted may be different than that of a support soldier. A such, the general observation is that career fields with more technical orientation have higher thresholds, longer and subsequently, promotion point times to advancement than those in the larger and less technically oriented career fields AR 600-200 also sets minimum times of service and grade 18 which an individual must have promotion. Unless shortest period four years served superceded for promotion to by a considered for be special is two to E-5 policy, the years, and is This rate includes waivers for both time to E-6. in service and time in grade. Promotion to E-6 in four years requires that the individual be advanced to E-5 in two years. Specific b. Because of the lack within the army considerable care population, to uniformity of this thesis we have taken in identify address discontinuities and which would confound promotion based on merit. the elimination of some data, and the computation manipulation or restriction of data was which in point in the rank advancement discusses in structure, and detail of three to produce a sample individual started from the same each had equal opportunity for Chapter III, Overview of the Data, merit. by This includes The governing principle for different promotion rate scores. population of promotion identified the problems and what corrective action was taken. 4 . Analytical Tools Used This section briefly identifies the hardware and software used in analysis. a. Hardware Computational included an IBM 3033 MVS batch system. resources System 370 used for mainframe computer running Additionally, analysis was done 19 analysis for small data sets using a standard IBM microcomputer. b. Software packages were used for the majority Two software of the data analysis. resulting in for analysis components Version SAS tabular output, such as principal analysis factor and was used predominantly 5 .[ Ref Graf stat 4,5] . an mainframe data analysis and plotting program, unreleased IBM was utilized for analysis requiring graphical confirmation of SAS tabular results [Ref . E. - . output and for 6,7] SUMMARY The objective of this introduction has been to adequately frame the scope the of topic, and present sufficient to background to the reader so that he or she is alerted to some of the difficulties Also, this inherent will establish a in topic a of this nature. reference for some of the tools used to conduct the analysis. The length of this section is indicative of the degree of preparation relationship which has to analyze a significant complications in both dependent and independent variables. stripping of reality of required Although aberrant such a the data list makes assumptions of one cautious and the about the study, each event should be considered on its ability to uncover the answer to the central question of this thesis. The central question again is, whether or not a significant relationship intelligence and academic exists ability, 20 measures between and an of individual's promotion rate as to learn whether ability are and if a so, Noncommissioned Officer. measures of It is important and academic intelligence important indicators of promotion how strong that relationship is. in the army, If sufficiently reliable and believable relationships can be determined, then policies could to better identify and develop designed be capable individuals for positions of leadership. The analysis confounding policies, accession programs. size, which this of such thesis reduced the effects of as discriminatory promotion and It also used a sufficiently large sample allowed the averages to outweigh the exceptions. It drew on data from standard personnel records, most effective use of that information. 21 and made the . topic The REVIEW OF PREVIOUS STUDIES A II. intelligence relating of to some aspect of performance is an extensive and rich area of study. topic particular interest of military manpower specialists. quantity done work of scientists and demonstration of the simple cross- area, a and performance test 237 citations from the Lockheed's DIALOG list of information online this a intelligence referencing of the words produced a As in social to It is a Restriction files. those of available references to utilizing military intelligence test scores and statistical analysis of those tests relative to some performance methodologies. commercial a analysis, a analytical institution making use of there source of The from an in-house military by results in a large number of Within this restriction citations. study measure still a is study can originate contracted study done institute, military variety of a data or as an its academic media for analysis The nature data is also varied. of the readministered the ASVAB tests to other studies addition to used the relationship had Examples of IQ ASVAB. a Several studies selected test population, other intelligence measures in and The performance side of the an extensive number of dependent variables. performance measures 22 were: results of written . military examS/ skills test results, minority advancement, and comparison to collegiate ACT, PSAT, and SAT tests. This chapter will review four the of most closely related studies, concentrating for each one on: 1. The objective of the study. 2. The methodology used in analysis. 3. The conclusion reached. first The AFQT and analysis Military essentially an from Are Smart Tankers Better? is Productivity [ Ref . 8] This study is in-house military analysis, the authors being Army officers assigned to the Office of Economic and Manpower Analysis, at West Point, New York. As described in the title, the paper presents the results of an which the crews of tanks were scored on their ability to destroy targets on live fire ranges. gunner and commander tank was a The AFQT score of the one of several explanatory variables, having the tank scores as The analysis methodology used investigation in the dependent variable. log-log production model with ordinary least squares regression. The result of their analysis is best summarized in this paragraph from the study: statistically positive, a exists there "That significant relationship between AFQT and performance, is result. The coefficients on the model means a powerful that if we move, for example, from the AFQT score for an average Category IV TC to the AFQT score for an average (a 200% increase), we will increase the Category IIIA TC (the tank scoring exercise) by performance on Table 8 approximately 20.3%." , 23 . In this study then, regression, AFQT was found, by means of least squares have to relationship to definitive a well- a defined skill measure, the conduct of tank firing. The second study is an analysis done at the University of Iowa Research Cada the by Success in Training for report uses the ASVAB success recruits of primarily regression; concentrates performed 6 and 7 .[Ref 9] each variable for of the regression differences score This The methodology used is implicit for Marine Corps the scope however, sex Females; an explanatory training. in On Predicting Forms score as The the of regressions and identifying on female performance. discussion Males and ASVAB Clerical Specialties titled: Group between result the study's in differences category male and that is was the useful of predictive value. An interesting note about this study was that the inclusion of high difference between school male the completion reduces the female regression of articles used in the and coefficients The third Report to study is a section the House and Senate Committess on Armed Services, Defense Manpower Quality, Volume II, Army Submission [Ref . 10] . The section of interest to this thesis was a study done by the U. S. Army Training and Doctrine Command (TRADOC) Systems Analysis Activity (TRASANA). The study uses AFQT, as well as education level, sex, paygrade, time in service, in Military Occupational Specialty 24 ( MOS ) , and a time dummy variable reflecting General Equivalency completion as explanatory variables. GED Diploma is a rating given to individuals who did not graduate from high school, have taken examinations to be rated school graduate. conditions as equivalent but who to a high battery of tests given under controlled A resulted in dependent variable. net a score which made the was The battery of tests was designed so as to represent how proficient career field. (GED) soldier a was in his specific The test included a written, as well as hands- on proficiency test. The analysis method used was linear regression, with the inclusion AFQT. of Durbin a Instrument as a correction tool for The results are again best summarized from the report: "The most important result is that AFQT Category I-IIIA soldiers performed approximately 10% better overall than IIIB soldiers. Furthermore, AFQT was a much more important influence on performance in virtually all instances than either education or experience, whether measured in terms of time in service, MOS, or unit. Thus, these results strongly support the validity of AFQT as predictor of performance in these military a occupational specialties." . . This report then, is tank gunnery report, regression to have a very similar in which significant AFQT and in conclusion was shown measurable to the through effect on soldier performance in skill related tasks. The last study reviewed is also from the collection found in the Defense Manpower Study study was . [Ref. 11] The topic for this the estimation of promotion rate. It is presently central theme of this thesis. the most similar study to the 25 Using AFQT independent variables, a duration of the as one model is applied to estimate the expected speed of promotion. This model within two was applied and the career field of the NCOs study approaches of promotion This promotion estimation . aggregation the Specifically, by manner as well. categories, the paygrade in a different data of evaluating the possibility for each individual over a series of years, the dimension of time was entered into analysis. A significant advantage of including the time dimension was that changes in the categorical levels of the population could be accounted for, such as race or sex. The methodology used in the promotion estimation study is considerably Rather than complex more than in the previous studies. using standard regression models, the study uses the Generalized Linear Model form. the predictive model is a Weibull shape parameter. education, AFQT, Specifically, the form of log likelihood function using the The explanatory variables include marital status, race, number of dependants, time in service, sex, and high school completion using the Weibull the model, status, Additionally, there are assumptions for marital and the no status requirements residuals, and By of explanatory application variables which are not continuous, such as sex, completion status. is for high school more proper. the normality therefore, less subjectivity to the appropriateness of the model with respect to the independent variables. This method, however, does not 26 consider any in-service information and was calculated only for very specific CMF and Paygrade combinations. The results are summarized as follows: "A review of these promotion results reveals two trends. First, even after controlling for high school diploma status, AFQT Category I-IIIA soldiers are promoted approximately 10% more rapidly than 1 1 IB soldiers. Second, high school completion is less important than AFQT score in determining promotion rates. The remarkable aspect of this last result is that educational attainment is an explicit part of the Army's promotion point system, while AFQT scores are not. These trends are true for both promotion to E-5 and promotion to E-e." As considerable attention has already been topic positive results and since one might wonder why have generally the Deputy Chief further research in the the Army. Secondly, of been the result, study should be undertaken. another this thesis is in response to First, of measures of intelligence to performance, relating of given to the a request by the Office Staff for Personnel (ODCSPER) for relationship of this thesis approach and analytical procedures. AFQT to will be success in different in its Following is a list of the unique characteristics of this thesis: 1. The perspective of this thesis is that the results will an explanatory or as management tool, be used as a In that light, method for active duty Army personnel. utilizes information collected from the the study such as his Skill individual's in-service record, NCO Schooling levels. and his Qualification Scores, Similar to accession related studies, this analysis academic, and categorical intelligence, includes variables. explanatory potential as information However, the intent is not to justify accession of high investigate the trends of but to quality soldiers, promotion for active duty personnel as a function of available personnel data. 27 * . This study conducts significant investigation into the data to identify and correct anomalies which would confound the relationship in question. Statistical analysis is done from the bottom up, rather than by direct movement into regression models. This approach finds that strict parametric models are subject to error due to the inability of some data variables to meet distributional assumptions necessary The study then moves to for parametric analysis. nonparametr ic means to approach the issue. For regression models, given the cautions on their use, population is tested using the an additional sample Thus, the results from the initial model can be model. considered to have more believability and fidelity than model based on analysis of a single population a sample The use of a large data set.*- Several explanatory variables have been made available from the DMDC data base which have not been They include the initial used in previous studies. education at time of entry, NCO education level, and a race variable with six categories. The cho ice of promoti on as the dependent variable rather than a set of performance tests. Although prone to more uncertainty t han results of performance tests, promoti on is in ma ny ways an ultimate performance measure The servic e, like any other organization, recogni zes superior performance by promoting and advanci ng individuals to higher positions of authority, despite its problems, has a As such promotion r ate, strengt h of recogniti on well beyond that of technical perform ance , . This study uses graphical methods for depiction of many of the methods of analysis. Study number four from Defense Manower Study uses both large data sets and promotion as an independent variable. 28 Ill OVERVIEW OF THE DATA . INTRODUCTION A. A critical aspect of screening of creating data. the demonstrate Two general data set. level a this thesis guidelines were applied in First, the homogeneity of selection and was the data set that in had the to NCO's considered would all have served under similar enlistment and advancement policies. Secondly, the selection of individual records needed to be random and without unintentional bias to meet the requirements Section III describes C. insure that for representative a detail in the sample measures set. taken to the above two attributes were established in the study data set. Receding of data values into numerical required for several personnel record fields. the level of Military service schooling Schooling, is As an example, the NCO's in- level, was recorded as mixed alpha-numeric Transformation characters. which equivalents was involved rank ordering the available levels of schooling in ascending hierarchical order and substituting Chapter variable. IV numeric value for the alpha-numeric value. discusses in detail the background of each Finally, as a check on the effects of manipulating and restricting a a comparison of the sample data set, section III D. provided statistics for the entire U.S. Army NCO database, versus the sample data set used in this thesis. 29 . DESCRIPTION OF THE VARIABLES B. The data categories: variables used in this control variables, The promotion variables. intelligence, were used as intelligence variables, and first two categories, control and explanatory variables, brief description of each variable is Variable Category Dependent PRATE Promotion RATE Promotion PRA Promotion while the were used as the dependent variables promotion variables TABLE study fall into three tabulated in A . Table I Summary of Variables in Sample I Meaning Value Raw Promotion Rate: number of promotions 041-.21 per month to most recent promotion Promotion rate difference from average for that 2.2-9. 4 paygrade (normalized) Promotion rate difference from average for that 3.4-8. paygrade and CMF normalized) Scale Ratioj Ratio Ratio ( Explanatory SEX Control CMF Control RACETH Control PAYGD Control GTSCR Intell AFQTP Intell OAFQTP Intell EIMCAT Intell HIYRED Intell EDLVL NCOE Intell Intell PQSCR Intell Male/Female 0/1 Nominal Career Management Field 11-99 Nominal 1-5 Race/Ethnic group Nominal 5-7 Paygrade Ordinal General Intelligence 0-160 Ordinal Score Armed Forces Qualification Test Score 1-100 Ordinal Percentile Same as AFQTP, referenced 1-100 Ordinal on 1980 population 1-8 Mental Category; based Ordinal on OAFQTP Highest Year of Education 1-12 Ordinal upon entry into Army 1-12 Ordinal Present Education Level Military Education Level 0-13 Ordinal Attained 0-100 Ratio Army Proficiency Test 30 more A detailed description variables will be given in first the each of part of the study Chapter IV, of Successive Analysis. C. PREPARATION OF THE DATA Preparation data the of began acquiring with fifty thousand records from the U.S. Army Military Personnel Center in Alexandria, Virginia. Initial restrictions on the data were established to allow inclusion of only NCO's with a date of entry after January members to be National of Guard observation the Restricting NCO's the a not were who Reserve or provided for recruited a the ending of the Viet Nam establishment focused the study on the standing confounding as and restrictions NCO's period following and following Force. Army, These those only reasonable time War, Regular the forces. of Further, NCO's selected had 1976. 1, to the All-Volunteer of Regular forces alone, Army soldiers and avoided result of different promotion and accession policies in the Reserve and Guard Forces. The records requested were randomly drawn by taking every fifth individual from estimated an meeting the above restrictions. population of 250,000 The fifty thousand MILPERCEN records were then matched and merged with database Monterey, from the Management Defense California. information, including: a similar personnel Data Center (DMDC) DMDC database holds additional The the ability to distinguish high school equivalent certificates holders from actual graduates, 31 . . education the highest year of time of at EIMCAT scores renormed for AFQTP and enlistment, and soldier the of a 1980 population After the raerging, data records which of the critical variables fields were dropped. in any approximately were had missing values Following data. thousand ten records There missing critical analysis of promotion rates, two initial additional restrictions were applied the remaining against records First, grouping a several of hundred promotion rates showed that individuals had been promoted to the at which rates were Cross referencing of group as for who, duty. them NCO's who a variety service numbers identified this sub- had served in Reserve or Guard units and of reasons, accelerated Subsequently, as one promotion per month. high called for active had been they were allowed by regulation to carry with As such, an as rank of E-5 promotion to former their rank. serial number match and elimination was done a for all NCO's with recent listing as Reserve or Guard status. A level promotion second source of unusual became oriented apparent career particular. in management Research indicated that during some rates at the E-5 technically of the more fields, the medical field in into Army special recruitment policy the early 1980's special provisions were made to allow persons with background ability in certain technical fields to enter the Army 32 and be promoted to NCO status within six months, or in certain cases to receive NCO status immediately following basic training.^ these anomalies, all promotion To correct for rates which fell outside the maximum time periods considering application of both waivers were discarded. D. COMPARISON TO TOTAL ARMY STATISTICS In this section, selected attributes of the sample data set and the complete U.S. Army database are briefly compared, with intent the checking the representativeness of the of sample set. Population attributes such as distribution of sex. Career Management complete Fields, U.S. paygrade and database Army were records obtained from consisting of the over 250,000 NCO's. As described in paragraph 50,000 selected records had personnel who entered 3.B, been Army the the sample data set of filtered to contain only after 1976. Screening of those 50,000 records for completeness of of promotion policy, reduced the number in the sample set to approximately 38,000. sample final set to It was 1 prudent then, to check the see if it retained its representative It should character as a random sample. that this data and uniformity be noted, however, comparison will not occur for all study variables. MSG Knopp, NCOIC Defense Management Data Monterey CA 93946. El Estero Drive, 33 Center, West. Reasons for this include non-availability of records from the MILPERCEN database, and cases where the statistic was computation by produced through the author, promotion rates being the principal example. 1 Comparison of Army versus Sample Summary Statistics . Formal hypothesis testing for means or distributions was unavailable due to computational and software with ANOVA restrictions. However, since the intent of this identify any simply to section was population shifts, and the magnitude of those shifts, observation of summary statistics is assumed sufficient. to be deviations of four entire set. Specifically, the means and the standard variables population NCO obtained were from both the set and the thesis sample data data The percent difference between the variable means was computed and expressed relative to the thesis sample data. A table of comparative statistics and the percent difference is shown in Table II. TABLE II Tot al Army vs Sample Summary Statistics Sample Tota 1 Army Sample Size Variable AFQTP SEX RACETH PAYGD The three variables noticeable changes while RACETH the (37,854) Mean Std Dev 53.4 20.9 1.12 .328 1.65 .942 5.27 .464 (250 ,000) Mean Std Dev 48.3 25.2 1.09 .283 .991 1.63 .597 5.75 between variable AFQTP, the SEX, PAYGD and > > > < have Sample and the Total Army, doesn't 34 Percent Difference Sample 10% Sample 2.7% Sample 1.2% Sample 5.2% appear to have been affected much by sampling. closer look A at the discrete distributions, and an overall conclusion about differences in the two data sets follows. 2 . Discrete Distributions Figures and 3.1 discrete distributions for Both plots illustrate differences in the 3.2 are Clustered paygrade and Bar Charts, each level of the discrete variable race respectively. and the percentage of for both the Total Army and the Sample were plotted next to each other. ARMY VS SAMPLE RACE PERCENTAGES ARMY VS SAMPLE PAYGRADE PERCENTTAGES 80 CLUSTER BAR r SO C2 CLUSTER BAR 60 TOTAL ARMY TOTAL ARMY 'O SAMPLE < C2 SAMPLE ASIAN OTVIER 40 20 - 20 m. WHfTE E-7 E-6 E-5 BJ^CK HISPANIC Figure 3.1 the tabular data and bar charts show that there are some differences between personnel, AFQTP the slightly related INDIAN Figure 3.2 Observation of Specifically, , RACETH VALUES PAYGRADE VALUES contains sample more scores. the two populations. women, The more ranking significantly higher and racial lower make-up of the sample appears to be similar. The restriction of random entering the service after sampling to 1976 can 35 only those persons directly or indirectly explain these differences. is direct a result First, the lower average paygrade promotion of impossible to achieve a above rank policy, in which it is E-7 less in than ten Hence, the sample population should be demonstrate a years. Secondly, the slight increase in the lower average paygrade. proportion of be explained by a general opening women might up of the services to women in the eighties. Thirdly, the late seventies higher AFQTP and early is a direct result of policy restrictions begun in Fiscal Year 1981, and formalized Authorization Act. 1984 Defense by the constraints on AFQT Category and high [Ref. lOrsec 1-0, general improvement services Whether p.l] resulted social of school diploma status. these restrictions, or the acceptance the military of improvement is a question AFQT this in This placed quality which would require significant study in itself. In short then, from the different in the sample is It should be noted, total NCO population. that these results are intentional. restricting the dangerous to the soldiers who sample study to after than the were accessed Viet Nam War policies. unless significant will The shifts however, caused by 1976 are felt to be less alternative during the of including draft and the era of Finally, it is only a matter of time, changes in accession and promotion policy occur, before the character set several ways constitute the demonstrated by norm for the sample data all NCOs . concluded that the study sample is satisfactory. 36 Thus, it is SUCCESSIVE DATA ANALYSIS IV. A. INTRODUCTION In this chapter the analysis data followed reported. be systematic a method for This method of analysis format which is described by Chambers in Graphical a Methods for an will results of Data Analys is understanding C Ref the of descriptive univariate . . This procedure develops 12] beginning data, with simple procedures, then progressing through several increases in dimensionality of variables, and finally into more the procedures inferential complex of model An abbreviated outline building and multivariate regression. of this procedure is shown below. 1. 2. 3. 4. 5. In Analysis of single variables. Comparison of variable distributions. Analysis of paired variables. Multivariate graphical analysis Linear Models including: Simple Regression a. Multivariate Models b. addition supplemented with to this procedure will be non-graphical measures, such as these several steps, ANOVA, ANCOVA, and several tabular nonparametric methods. be noted that procedures which are should considered investigation, or whose results merit. chapter, reports analysis this an only essential provided an It those step in observation of Many available procedures have not been used in this as a consequence of 37 the data failing to meet . distributional assumptions, and for other reasons which would During make such analysis inappropriate. chapter, this results the of the development of each level of analysis will specify why the next set of analysis procedures Alternatively, if popular a class was pursued. procedures of is disregarded, the logic for disregarding is explained. The objective of detailing this procedure is to present depiction thorough the of a nature of the variables, and to explain the development of resulting inferences and models. B. UNIVARIATE ANALYSIS. 1 Dependent Variables . a. PRATE (1) raw promotion General . The variable PRATE represents the rate of a particular individual. Numerically, it is the total of promotions per month up to the most recent promotion (2) Value. The variable PRATE was computed using data obtained from the DMCD database. The time to most recent promotion in months was found by subtracting the basic pay entry date from the date of latest number then individual's became rank, the or denominator equivalently , award of rank. This of a ratio having the the total number of promotions the individual has received, as the numerator: Individual's Latest Rank Prate = (Award Date of Latest Rank) 36 - (Date of Entry in Army) . Ranks were numerically represented with an E-5 Sergeant, ranks. The and with variable were: score of and 7 for values of the 6 resulting a 5 for next two of measurement for the PRATE units units of promotion per month of service. Attributes of (3) PRATE qualifies the Variable The variable . as a continuous variable with a ratio scale. The continuous nature of the variable relies on the fact that number the service months of combined three with rank structures yields sufficient combinations of values, actually to use as measures. 190 in all, There are problems inherent some score, since promotion minimum time thresholds for policies are with effect in promotion. the raw PRATE which set Thus, the promotion of an individual who is presently an E-5 will be incomparable to the promotion rate of an E-7 whose three promotions have time been affected by the minimum minimum time in service Generally, the policy. promotions grows as rank between increases, and more senior soldiers will normally have lower raw promotion rates A second source of bias potentially found in the is Career Management Field (CMF) of the soldier. policy is based on be attained within promotion. a a system of minimum performance points to CMF Generally, the order in of the to be more technical higher promotion point thresholds The distribution Army promotion fields will have than non-technical fields. variable PRATE 39 considered for and its summary Figure statistics are shown histogram positively is in flat shape median value, just until a be a few individuals sloping steep generally a After the occurs. tail A shape is that there appears to of this promoted at who are block then of the a median value. the downward gradual rough interpretation partitions, past shape The demonstrating skewed, first ascending slope in the 4.1. average very fast rates, promotion rates, then a diminishing tail of individual promotion rates which fall to followed by a of the right of the seventy-fifth percentile. HI STOGRAM TABLE PRATE HISTOGRAM AND STATISTICS (N=37854) X SELECTION X LABEL NO. OF ELEMENTS X MEAN STD. DEVIATION SKEW.'ESS KURTOSIS 5-PERCENTILE 25-PERCENT I LE MEDIAN 75-PERCENTILE 95-PERCENT I LE 0.0-+ 0.08 0.12 0.18 X MIN. X MAX. dm » PRATE ALL PRATE 37854 0.109+6 0.036322 0.59367 2.5854 0.051225 0.08 0.10204 0.13514 0.17857 0.041667 0.20833 O.JO PRATE Figure 4.1 Distribution transformation attempted, primarily because its of this variable usefulness in was not testing or modelling is limited by the problems associated with the bias factors described above. 40 . b. RATE General (1) expression of The . variable the variable PRATE. RATE individual rank removed by normalizing each a re- bias due to has It is individual score relative to his or her paygrade Values (2) To compute the variable RATE, . the average PRATE value for each paygrade was calculated, as well as the standard deviation for that paygrade. Individual scores were then normalized by the transformation: RATEt PRATEi = - AVERAGE for that Rank STANDARD DEVIATION THAT RANK Attributes of (3) RATE is also the Variable continuous ratio a The variable . scale variable, as it is a transformation of PRATE. The removal of influence due computing the rank to correlation coefficient RATE and PAYGD. in Table As seen X, was confirmed by between the variables a value of near zero resulted where the previous correlation coefficient for PRATE and PAYGD had been -.495. from PRATE results in The distribution a Thus, the transformation to RATE variable independent of PAYGD. shape of the RATE histogram, shown in Figure 4.2, appears slightly non-normal, but summary statistics closely to the for standard a check of the quantiles show that they correspond normal quantiles. Thus, the assumption of normality for procedures using this variable is 41 1 observation still reasonable, based on the distribution of shape and the close agreement of quantile values. Figure 4.2 presents histogram and summary statistics for a the RATE variable. g ni3 RATE HISTOGRAM AND STATISTICS (N=37554-^ SELECTION X LABEL NO. OF ELEMENTSX MEAN STD. DEVIATION SKEVMESS KURTOSIS S-PERCENTILE 25-PERCENT I LE MEDIAN 75-PERCENT I LE 95-PERCENT I LE lO _ 8 r o ; o o — _ 1— o - 04 _ g ___ - o „. J. i_ _J -i -2 : RATE ALL RATE 37854 : -1 .555E- X — - !>- X MIN. X MAX. ~n, : : : : 0.99997 0.21408 2.3767 -1.5476 -0.77573 -0.03757 0.70754 :1 : : .5234 -2. 2681 3. 6685 RATE Figure 4.2 c. PRA General (1) recomputation of the raw The . variable promotion rate. is another PRA PRA controls for the career management field as well as paygrade. of normalized promotion PAYGD and CMF. the.ge which was is set are independent of Verification of the independence variables coefficients. scores, It of PRA from also confirmed by checking correlation Both variables CMF and PAYGD had near zero values of correlation with PRA. (2) Values . in the same manner as in Computing the variable PRA was done RATE, however 42 a mean and standard 3 1 deviation for each CMF and PAYGD combination was computed and used in the normalization equation. Attr butes (3) with 3 ratio scale. a PRA is . continuous variable The distribution of PRA appears normal, with the quantile values very close A a to the standard normal. comparison of percentile values for PRA versus the standard normal are shown in TABLE III. PRA HISTOGRAM AND STATISTICS (N=37854) HISTOGRAM TABLE SELECTION X LABEL NO. OF ELEMENTS X MEAN STD. DEVIATION SKEV^ESS KURTOSIS 5-FERCENTILE 25-PERCENT I LE MEDIAN 75-PERCENT I LE 95-PERCENT I LE o o — 1 o < — s a: - « d-2 c^ ^ t 4 X MIN. X MAX. PRA ALL PRA 37S54 7.41E-9 0.99881 0.21406 2.6552 -1 .5518 -0.75252 -0.04146 0.69604 1 .7086 -3.4988 4.5374 PRA Figure A comparison of 4 . percentiles for the PRA distribution versus the standard normal distibution is shown in Table III. Specifically, the PRA percentile corresponding standard data point. while a a values are listed with the normal percentile values for the same For example, -1.5510 indexed in -1.5510 is the PRA five percentile, a standard normal table results in six percent value. 43 Comparison of PRA vs N ormal Percentiles TABLE III Standard St andard Normal 6% PRA 5% 22.6% 48.4% 75.7% 96.3% 25% 50% 75% 95% Normality this for variable general distribution shape and will be assumed based on correspondence of close the the data percentiles to the standard normal percentiles. 2 . Control Variables d. SEX The variable SEX is discrete and nominal. are represented by a numerical value of one, represented with a two. Males and females are 12.29 percent In the study sample, of the sample was female, and 87.71 percent were male. e. CMF Career variable with Management nominal scale. represented in the sample. assigned a Each (CMF) Thirty is discrete a three CMF's Career Management are Field is numerical value, for example, the Infantry branch is designated as CMF 11. of the Field These assignments are a Department Army numbering system, and can be reviewed along with the CMF percentage and frequency table in Appendix There is some system, for instance, ordinal information in the A. numbering low CMF numbers are indicative of 44 a combat branch, such as Infantry or Armor. Center CMF values are indicative of combat support branches, such as Signal and Chemical. values CMF Upper from the combat service are support branches, such as Medical and Language Specialist, Figure histogram, CMF the 4.4, does reflect the distribution of the three general groupings of CMF densities: and combat service support values have roughly equivalent combat, combat support, combat and combat representation, while the upper numbered support. The service support CMF's are about two thirds the size of the other groups. CMF HISTOGRAM (N=37854) COMBAT COMBAT SPT COMBAT SVC SPT y c < O 2 20 40 60 BO 100 CMF Figure 4.4 f. RACETH The race-ethnic variable. The values variable is represented and shown in table IV. 45 a discrete, nominal their percentages are TABLE IV Sample Race Percentages Percent Race Value Cumulative Percent 52.43 52.43 White 38.59 Black 5.58 Hispanic .26 American Indian/Alaskan Native Asian/Pacific Islander 1.15 Other/Unk nown 1.99 1 2 3 4 5 6 g. 91 .02 96.6 96.86 98.01 100.00 PAYGD Paygrade selection of NCO rank is a from discrete, nominal variable. personnel enlisting The after 1976 representation by paygrades E-5 through E-7 only resulted in The distribution of PAYGD is shown in Table V. Sample Paygrade Percentages TABLE V Value 5 6 7 Percentile Rank 73.29 25.89 0.81 Sgt E-5 Staff Sergeant E-6 SEC E-7 Cumulative Percent 73.29 99.19 100.00 The 0.81 percent for E-7 results in only 307 SFC's in the sample. other Despite the ranks, a preponderance of sample size representation by the of 307 for the E-7 rank still allows for adequate representation of that subcategory. 46 3 . Intelligence and Academic Scores h. GTSCR The General Intelligence individual the ordinal scale. The lower is a continuous Test Score variable with at least an The range of values run from 50 value of (GTSCR) of through 160. the corresponding minimum 50 represents score of ASVAB modules that would allow for enlistment in the The histogram of the Army. 4.5, GTSCR variable, shown in figure is approximately normal. larger density in the Checking the quantiles shows distribution to the left of the mean, with slightly lower valvaes for quantiles right of the mean. HISTOGRAM TABLE GTSCR ALL SELECTION GTSCR X LABEL 37S54 ELEMENTS NO. OF 108.23 X MEAN 14.275 STD. DEVIATION 0.129 SKEWNESS 3.3632 KURTOSIS 84 5-PERCENT I LE .99 25-PERCENT I LE :109 MEDIAN :117 7 5-PERCENT I LE :130 95-PERCENTILE GTSCR HISTOGRAM AND STATISTICS (N=37a54) X : : : 8. a. 4/> O z o o o nru., JZl 60 60 10O a 120 140 GTSCR Figure 4.5 47 160 X MIN. X MAX. :54 :156 . AFQTP i. The Armed Forces Qualification Test Percentile is a continuous represents the variable with relative standing score referenced ordinal scale. of Its value individual's test an against a 1944 population. This means that an individual's raw AFQT score is compared against a standard was developed values that table of of raw AFQT test the entire 1944 American youth scores for resulting population. Hence, simply corresponding the represent the distribution designed to values from 1944 was a This table of in 1944. individual percentile AFQT score is of the individual raw AFQAT score relative to the entire 1944 population AFQT test distribution The histogram in Figure 4.6. about the and summary statistics for AFQTP are shown The density mean. of AFQTP is partially symmetric lower five percent quartile is at a The value of 21, demonstrating the restriction and VI study is primarily for comparative reasons. any developed reference population subsequent chapters, model since has AFQT CAT V Use of the AFQT score for this personnel since 1980. used in applied to ceased. was cannot be scoring against the 1944 As will be seen in discarded anyway when OAFQT proves to a better explanatory variable. 48 AFQT AFOTP HISTOGRAM A.ND STATISTICS (N=37854) HISTOGRAM TABLE AFOTP ALL SELECTION AFOTP X LABEL 37854 NO. OF ELEMENTS 53.^19 X MEAN 20.965 STD. DEVIATION 0.29913 SKEWNESS 2.2128 KURTOSIS 21 5-PERCENTILE 37 25-PERCENT I LE 50 MEDIAN 75-PERCENT I LE :68 X : o a. Iso O 20 40 60 80 •100 95-f'ERCENTILE X MIN. X MAX. :91 :10 :99 AFOTP Figure 4.6 OAFQTP j. The OAFQTP variable is ordinal scale. is It continuous variable with a fundamentally the same as the AFQTP variable, excepting the reference for measurement, which is 1980 population. a The distribution for OAFQTP is considerably lower values more dense in the Explanation of than AFQTP. this shift can be seen by reviewing the transformation tables in Appendix scores. A points. 1944-based scores to 1980 The transformations for values below 80 result in a 1944 based score to amount of converting for be reduced reduction varies, Only when the but it scores increasing transformations. 49 every case. in almost go can be above 85 The as much as four are there any HISTOGRAM TABLE OAFQTP SELECTION ALL X LABEL OAFQT NO. OF ELEMENTS 37854 X MEAN 45.319 STD. DEVIATION 24.779 SKEWNE5S 0.53139 KURT05IS 2.1725 S-FERCENTILE 14 25-PERCENT I LE 25 MEDIAN 41 75-PERCENT I LE 64 9 5-PERCENT I LE 92 OAFQT HISTOGRAM AND STATISTICS (N=37854) X 5 o ^ 8 O O 50 60 40 20 X MIN, 1 X MAX. 99 100 OAFQT Figure 4.7 k. EIMCAT EIMCAT based on EIMCAT the is is discrete a ordinal and assignment of categories is and is population reference 1980 a mental category of an individual the a scale AFQT test score. variable. Department of Defense standard/ common reference for all services. The breakdown of values is as follows: TABLE VI Value 1 2 3 4 5 6 7 8 Sample Men tal Category Percentages Category AFQT Cat Cat Cat Cat Cat Cat Cat Cat 01-09 10-15 16-20 21-30 31-49 50-64 65-92 93-99 V IV C IV B IV A III B III A II I The Percent .33 6.736 9.788 19.187 26.116 13.053 19.99 4.8 50 Cumulative Percent .33 7.067 16.854 36.041 62.157 75.21 95.2 100.000 8 histogram of the EIMCAT values follows in Figure 4.8.- A SAMPLE EIMCAT DISTRIBL/TION BAR CHART OF PERCENT ~ 25 - 20 _ V/ 7/ PERCENTAGE - T/ V) 5 - v? - V n 3 4 5 6 EIMCAJ (MENTAL CATEGORY) Figure Observation clearly the of the 4 . above figures demonstrates fact that categorization into EIMCAT category is not evenly distributed across the scale of OAFQT scores. center EIMCAT, example, the points, while EIMCAT point eight EIMCAT scores. discrete more scale For value five, spans almost twenty contains only the upper seven does make available an established, measurement representing intelligence test scores for use in appropriate statistical procedures. 1. HIYRED HIYRED is the highest the individual upon entry into the and ordinal scale variable. year of education held by army. It is a The values and distribution percentages are shown on the next page in Table VII. 51 discrete ) TABLE VII Sample Hi ghest Year of Education Percent Cateqorv Value Cumu.Lative Percent 1 2 3 4 5 5.5 6 7 8 9 10 11 12 1-7 Years 0.018 0.153 Years 1.397 1 Year High School 4.7 2 Years High School 3-4 years HS (no di ploma 6.935 High School GED 4.813 High School Diploma 71.274 1 Year College 3.305 2 Years College 3.453 3-4 Years College (no degree) 1.337 College Graduate 2.560 Masters or Equivalen t 0.05 Doctrate or Equivale nt 0.005 8 .018 172 1 .569 6 .269 13 .203 18 .017 89 .29 92 .595 96 .048 97 .385 99 .945 99 .995 100 .000 . EDLVL m. EDLVL is the present individual. level of education for the These scores are related to HIYRED, in that any education taken by the individual subsequent to enlistment is recorded in a this variable. A GED equivalency is included as value of six for high school completion. TABLE VIII Value Sample Education Level Percentages Cateqorv Percent 1 1-7 Years 2 3 Years 1 Year High School 2 Years High School 3-4 years HS (no diploma) High School Diploma 1 Year College 2 Years College 3-4 Years College (no degree) College Graduate Masters or Equivalent Doctors or Equivalent 0.042 0.011 0.198 0.793 1.503 80.443 6.089 5.828 2.037 2.948 4 5 6 7 8 9 10 11 12 8 52 0.1 0.008 Cumulative Percent 0.042 0.053 0.251 1.043 2.547 82.99 89.079 94.907 96.944 99.829 99.992 100.000 9 Observation of Figure 4.9, or percentages in Table VIII, shows an observable upward enlistment. This is shift education of level after possible, and encouraged with official continuing education and high school completion programs. HIYRED AND EDLVL PERCENTAGES CLUSTER BAR 80 P U60 < u 40 CZl L 20 tTL JZ! 5 \A 4 a level discrete of individual. Officer Education variable, It reports and ordinal scale variable. military Military organized in three advanced. 12 11 . The Noncommissioned the 10 NCOE n. is HIYRED EDLVL _e:01__CZ2. 7 8 YEARS EDUCATION 6 Figure NCOE, El accomplished schooling categories schooling ascending levels: At the two lower levels, are seperate courses for combat and by the are generally primary, basic and primkry and basic, there non-combat CMF's. In some cases, there has been an award of an On-The-Job Training qualification. NCO who The OJT award is used to give credit to an can achieve technical competence in advance of being 53 . eligible for promotion to the next higher paygrade. As previously mentioned, attendance at military schools an individual being previously is sometimes associated with identified as a the advanced level schools where selection for attendance is superior performer. through Department of the primary level, local commanders selection procedures attendance a Table IX Army and often locally mandatory and Figure 4.10 This is Selection true mostly in Boards . At the have authority to establish will make primary school requirement for junior NCOs demonstrate categories and the distribution of NCOE. TABLE IX Value 1 2 3 4 5 6 7 8 Sample NCOE Percentag BS Category P Brcent Nonpar tici pant 21 19 Primary NCO Course (CBT CMF) 4 46 Primary Leadership Graduate 39 36 On-The-Job Credit for E-5 skills 5 38 Primary Technical Course Graduate 2 82 On-The-Job Credit for E-6 skills Basic Technical Course Graduate 5 11 Basic NCO Course (CBT CMF) 15. 99 On-The-Job Credit for E-7 skills 01 Advanced NCO Course Selectee 2. 28 Advanced NCO Course Graduate 3. 06 01 Advanced NCO nongraduate, OJT 06 On-The-Job Credit for E-8 skills , 9 10 11 12 Figure 4.10 presents , - a Cumulative Percent 21.19 25.65 65.25 70.63 73.45 73.45 78.56 94.55 94.56 96.84 99.89 99.9 100.00 histogram of NCOE discrete levels 54 40 SAMPLE NCOE SCHOOLING PERCENTAGES BAR CHART r 30 UJ < o 20 7/ a. 10 - m ^M^ A 2 1 3 5 4- 7 5 UTTWn 8 10 9 11 12 U 14 13 NCOE EDUCATION Figure 4 . 10 PQSCR o. Occupation report PQSCR is Skill Qualification individual. It is a of Test Score (SQT) The SQT is a service related test which is used competence of the technical used by promotion promotion. correct Separate The answers boards as on a are of the soldier. SQT score has been qualitative a to determine measure for value represents the percent of numerical tests SQT a Military and ratio-valued variable. continuous a Primary the written hands-on and evaluation. written for each CMF, although the structure of the tests are similar. The distribution of PQSCR, shown in Figure 4.11, dense in is more the upper values, with an abnormally long left tail extending to a lower bound of An explanation 21. for the shape of the PQSCR distribution is an involved topic, and has itself been the subject of study. that PQSCR has previously been 55 A general observation is used in a manner where 1 individual soldier scores were often aggregated as comparison of the parent unit of the soldiers .[ Thus, significant units and individual training Ref means of a 11 :p. . emphasis has testing in previous years, and pressure been focused on SQT to perform we] 1 was a positively skewed distribution, rather than As result, a 43 influenced by the parent organizations. a is understandable. normal distribution, PQSCR HISTOGRAM AND HISTOGRAM TABLE PQSCR SELECTION :ALL X LABEL :PGSCR NO. OF ELEMENTS 37854 X MEAN 78. 384 STD. DEVIATION 1 609 SKEWNESS -0.70832 KURTOSIS 3. 5739 5-PERCENTILE :57 25-PERCENTILE :71 MEDIAN :80 75-PERCENT I LE :87 95-PERCENTILE :95 STATISTICS X (N=37854) J 8 S : : : (A : 1 . : - : — " o ^ g CM - — o 20 J. -. 80 eo 4-0 X MIN. X MAX. :21 :100 100 PQSCR Figure 4.11 3 . Summary The fifteen variables used in this a wide variety of characteristics. 56 study demonstrate All of the dependent . variable choices were continuous with showing departures from normality. slight only RATE two, and PRA, The other continuous variables did not have identifiable distributions, and could not be transformed to normality using power or log transformations. Nor is it entirely clear that one would need to use a transformed variable in subsequent analysis. The independent variables compris continuous and discrete values, scales. Within the with both independent principal sets of related mixture of ordinal and ratio variables variables. are all derived from the ASVAB. another a there are two The intelligence test OAFQTP, EIMCAT, and to a lesser extent GTSCR, scores, AFQTP, one of varying in These degrees, expression, transformation, or variables differ from and derived similarly a either are a re- set of scores academic performance measures, EDLVL and HIYRED, The two are related, in that EDLVL is simply addition of sets of differences in the additional schooling since entry into the Army. Despite variables, it within similarities the is felt these sufficient that two informational value are present in each expression. since the variables used are all standard Further, data collection items for the DMDC database, each variable expression will be studied. The variable from relative this study merit may be of any single useful to managers seeking appropriate data sources for other studies. 57 or combined An important variables Analysis Of Variance distribution as hypothesis testing. seek use to similar efficiency, scale or be checked. replacement or as of the In this standard However, if results of the analysis are requirements fails, or if assumptions, those examination of assumption If there is nonparametric a test of nonparametric tests will be conducted as a confirmatory precedure. BIVARIATE ANALYSIS This section concentrate will relationships between as function a categorical, variables. Three used in this section. The association using correlations. the strength of scatterplots of pairs and of Jittering is Pearson provide intital between any effects, or analysis will be method analysis of product-moment information as to two variables, and relationship, being or negatively correlated. LOWESS first of association the of methods of matrix a This will the direction of that identifying on pairs of variables, and in identifying shifts in distribution of of the necessary the scale of the variable. distributional to assumptions will C. parametric initially will parametric methods. a many that these study (ANOVA), and possibly regression will well as analysis sensitive of These include assumptions about the form not be met. study, standard for analysis the of observation the is assumptions result either positively The second method will be analysis of variables, to 58 using the techniques better view any trends in the variables. This method will give initial information on what type fitted of relationship exists variables. Of significant relationship is fundamentally possibly between polynomial method used will be distribution hence and line, independent interest linear, analysis of plots. dependent and be whether the will whether or curvilinear. or mathematical what it is The third and final three-dimensional empirical will demonstrate some shifts in This distribution within several of the effects variables. 1 Correlation Matrix . As earlier mentioned, product-moment Pearson pairs of variables which purpose the correlation have and a value of zero linear association that indicates with each other. an exact direct linear relationship, inverse association. a preliminary variables the The while a -1 indicates an measurement of This indicative of tool to have no value of +1 indicates A relationship. linear association is not completely is only to identify is the correlation coefficient, rho, is from -1 to +1, range of exact matrix strong a reviewing the of dependency, and identify candidate variables for testing and subsequent inferential statistics. Remembering the central question of this thesis, the most important pairs variables of intelligence and academic scores rate variables. interval scale Of effects almost will paired equal variables 59 then be with interest any of the the promotion will demonstrating a be any strong linear relationship with the promotion variables. strength The relationship linear the of between two variables/ or its level of significance/ is based on how much variance there the estimated is in is dependent the variance of rho considered. For example/ then effectively demonstrate needed to for if the sample size were small, or minus of plus significance. Conversly, large sample set with very small standard deviation for a rho, a smaller much value rho An estimate significant. could sample Considering size. 37,854, the resulting estimate rho is .005139. Thus, a be considered for the standard deviation of rho can be found by computing the inverse of the and positive or negative value of rho would be large a Further/ sample size being on the standard deviation the value of rho had a .3/ value of rho. the square root of the thesis sample size of of the standard deviation of value of rho different from zero by plus or minus .01, could be considered significant. In Table correlation X matrix complete the Pearson product-moment computation is assumes pairs the preferred method since correlations with are variables. primarily This is interested in RATE or PRA variable as one of either the the pair of variables. the Spearman we The parametric method and a and continuous of normal product-moment study variables is given. the for Pearson Additionally, nonparametric method, it is possible, using to compute a correlation value rho for pairs of ordinal, or higher scale variables. 60 . [Ref. 251-253] 13:pp. free method The Spearman method is The last lists correlations Comparison of there was distribution providing correlations based on the ranks of the variables. the a an column on the second part of Table X computed using the Spearman method. Spearman versus acceptable Pearson values correspondence methods/ and Pearson values are used showed that between the two exclusively to simplify analysis Even with application of methods there remained several not meet the Spearman and Pearson both the characteristics distributional assumed variables which did pairs of These variables are correct interpretation of the rho value. the CMF. discrete, Their variables nominal results interpretation of are the rho most important rho values SEX, included in Table PRA column and are underlined. 61 RACETH, and possibly Table in value would X for X, but any be ineffective. are The located under the TABLE X PRATE Pearson Correl ation RATE .822 PRATE 1.000 .822 1.000 RATE .790 .951 PRA .118 GTSCR .035 .100 .155 AFQTP OAFQTP .177 .209 EIMCAT .174 .200 .168 HIYRED .156 EDLVL .085 .139 NCOE -.200 .047 -.019 .013 SEX -.074 -.143 CMP RACETH-.064 -.084 PAYGD -.495 .000 .039 PQSCR .101 PRA 1 GTSCR AFQTP .035 .118 .107 .100 .155 .133 .177 .209 .177 .174 .200 .170 .039 .101 .094 .741 .734 .937 .689 .903 .955 .274 .308 .315 .305 .066 100 .093 -.013 -.042 -.128 .097 L.OOO .790 .951 .000 .107 133 .177 .170 .177 .162 .006 .036 .000 .057 .000 .094 1 . - .000 .741 .734 .689 .210 .266 .039 .055 113 . .495 .000 .000 .143 .087 .031 .023 .001 .098 .433 .057 .053 .016 .000 .097 1 .157 .168 .178 .210 .215 .245 .209 .000 .708 -.063 .131 .146 .024 .000 .066 1 .085 .139 .162 .265 .258 .266 .242 .708 .000 .004 .114 177 .039 .098 .100 1.000 .903 .215 .257 .955 .245 .266 1.000 -.009 -.060 -.062 .159 106 .050 .074 -.325 .031 .315 .062 .067 . -.242 -.305 .143 .274 .087 .398 . -.200 .047 - .005 .039 .009 -.060 -.062 - .063 .004 1.000 -.081 184 . .015 .432 .093 62 OAFQTP EIMCAT PQSCR 1.000 .937 PEARSON COEFFICIENTS CONTINUED PAYGD HIYRED EDLVL NCOE SEX PRATE RATE PRA GTSCR AFQTP OAFQTP EIMCAT HIYRED EDLVL NCOE SEX CMF RACETHPAYGD 1 PQSCR C oef f icients CMF -.018 -.075 -.142 .036 .054 .159 .049 .063 .131 .114 .000 .113 .107 .074 .068 .146 .177 -.081 1.000 .258 .042 -.056 -.013 -.184 .013 .209 .241 -.314 .023 .305 SPEARMAN RACETH PRATE -.064 1.000 .084 .808 .777 .020 .075 .165 .158 .147 .038 -.208 .020 - -.056 -.242 -.306 -.325 -.313 .025 .024 .039 .015 .042 .025 1.000 -.054 -.042 -.016 -.128 .258 1.000 . -.069 -.092 -.535 . The significant most observations from the tables are summarized as follows: For the variable RATE there is zero correlation PAYGD variable. did remove the transformation Thus, influence the paygrade of with the of PRATE to RATE promotion rate. on Similarly, for the variable PRA, both PAYGD and CMF have zero correlation As expected, the three highly correlated in With two promotion rate variables are all positive direction. a exceptions, the correlation values for the effects and independent variables have similar magnitudes and signs across expressions of all three first exception is the NCOE negatively correlated promotion rate. variable. with a Under PRATE The it is value of 0.2, and positively correlated with lower values for RATE and PRA. This result makes sense when one considers that NCOE is highly correlated with PAYGD, lower for Specifically, raw (0.565). higher grade NCO's due to time in service and time (-.495). in grade requirements, correlated with relationship. as it is promotion rates are in PAYGD, When the RATE and will Hence, NCOE, which is highly also reflect that inverse influence of paygrade is eliminated, PRA, this negative correlation is incidentally removed. The second exception is for the variable SEX positive signed RATE. where it is for PRATE and PRA, but negatively signed for The magnitude for all three values are close 63 to zero. . . explanation An will RATE difference in sign between PRA and the for presented be analysis the in of empirical distributions and coded scatterplots Groups closely of correlation same related variables have generally the across Specifically, AFQTP, three the promotion OAFQTP, EIMCAT, and to GTSCR, all demonstrate a strong positive variables. lesser extent, a correlation against each other, and show the same trend when compared against the promotion rate variables. EDLVL demonstrate weaker variables HIYRED and similar characteristics, however, EDLVL is HIYRED than The academic with respect to the promotion rate variables Considering RATE PRA and as variables to model with, and allowing from each the of the better for only promotion one variable related groups, the six most significant correlated variables were selected. These in descending absolute value of rho, are shown in Table XI. variables, listed Most S igni ficant C orrelated Variables Consi.d erin g both RATE and PRA Variable Rho Va lue approx 0.17 HIYRED OAFQTP approx 0.14 GTSCR approx 0.10 PQSCR approx 0.09 RACETH approx -0.06 NCOE approx 0.006 TABLE XI These used as variables, the starting either paired basis 64 for with RATE or PRA, were multivariate regression analysis. effects The variable SEX included was subcategory analysis in an effort to detect any for influence it might have on the primary relationships. 2 Paired Scatter Plots and Simple Regression . Plots of paired independent were implemented to purpose visually was patterns. to two search purposes. for any dominant plotting to detect nonlinear only linearity, it is quite possible relationships could explanatory and dependant variables. relationship was strictly Y=X* be The first Since the rho values found in the previous section are designed that accomplish and dependent variables zero. Thus, relied one if For example, computed a , between exist the if the X-Y rho value should only on correlation coefficients to detect relationships, he would be misled into thinking that variables. variable. relationship Simply explanatory require no variables specification plotting between scatterplots X-Y the two of the the promotion variables did not with of existed the Visual observation detect dominant patterns of response of could then any form. the dependant be relied upon to These scatterplots used two special procedures, LOWESS and Jittering, which will be described in analysis of Figures 4.12 and 4.13. Secondly, simple for all variables significantly least squares which had The correlated. regression procedure yielded been a 65 regression was performed previously found to be simple least squares value called the Coefficient . of Determination, or R2 related to is mathematically R2 the rho, and in the one variable case, the square of rho is equal qualitatively R2 to can R2 strength the The advantage accounted for by the assumption also of used to be linearity of of producing represents the R2 directly results for Thus, . interpret simple linear model. was that (R-square). for a R2 values proportion of variance linear a model. The each of the regressions and an explanation of R2 will be discussed in analysis of Table XII. Paired Scatterplots a. interpretation Since coefficients assumes scatterplots was linear or correlation the of linearity, visual analysis of pairwise search observable patterns, used otherwise. to for This visual interpretation of single derived approach did not require parameters to identify any patterns producing In used. the scatterplots the LOWESS procedure was Locally Weighted Regression LOWESS, which stands for. Scatter Plot Smoothing, CRef. 12:pp 94-95] is a nonparametric smoothing procedure which is designed to relationships between quadratic discrete Y and relationship variables In particular, X. assumed. is against the variables, the discrete variables repeated plotting small random For continuous no linear or scatterplots of promotion rate were Jittered to overcome Jittering involves generating of points. increments, estimate functional which 66 are then added to the X values. when the X-Y plot is performed fewer As a result, values are repeatedly plotted in the same location, X and a better visual interpretation can be made of the quantity of X values at a discrete level. The overall results of the LOWESS predominant pattern was indeed plots showed linear. that the Further, the linear pattern was demonstrated most clearly between pairs of highly correlated variables. linearity and Figures 4.12 and 4.13 demonstrate that LOWESS the respectively. As a result, Jittering and techniques linear modelling techniques were considered to be the best choice for subsequent analysis. LOWESS SCATTERPLOT OF HIYRED VS PRA LOWESS PLCT OF PRA V3 OAFQTP (N=200C) • _ CM • • > u • • * • • • • • \ • Q. o o • _ • • • • • •; < * ;• . • • • 1- 1 •• • i • ^ • r C i s .. • 1 CM ; •- • • • • " ^ • . J :.' • • • • • \ • 1 i 20 1 1 1 40 I 1 !l eo e BO OAFQTP JIHEf^ED HIYRED Figure 4.13 Figure 4.12 b. Simple Regression For pairs of significantly a simple 8 least squares regression 67 correlated variables, plot using PRA as the 10 independent variable accomplished. was yields quantitative results in for pairs squares regression simple least The terms of slope values, intercept values, tests of the slope and intercept values, and the R2 value. The R2 value represents what proportion of total variance was explained by the from zero values range indicate simple that a linear model. does model values. such, As account not for any Correspondingly, a value of zero would be the estimate of the slope of the line. significance of R2, determine the significance of T test for the slope of R2 value, a the model a rejected. model of a If the T greater T value hypothesis Thus, we can be confident of the linearity and the derived of slope of zero is strongly a estimate. slope function of sample size. if the T test for the slope value would necessarily qualification for a low R2 considerable 'noise' T Thus, value, Sample statistic value of the size is is computed even with a small R2 were significant, held as significant. be To results of the null considered in this test because the as a the are checked. statistic is large and the probability small, The is related to sample size. like rho, its R2 value of zero would An to one. the dependent variance of linear would be the R2 The only that there exists or unaccounted variance in the response of the dependent variable. A summary of results are shown in Table XII. 68 . . TABLE XI] Simple Least Squares Summary Data using PRA as Dependent Varia ble Variabl e Intercept GTSCR AFQTP OAFQTP EIMCAT HIYRED EDLVL NCOE SEX CMF RACETH PAYGD PQSCR -0.856 -0.338 -0.336 0.004 -0.005 0.011 -0.020 0.011 -0.023 -0.009 -0.045 -0.059 Slope Std Err (0.0061 (0.014 (1.6E-02) (0.027 (0.047 (0.054 (0.021 (0.028 (1.6E-02) (0.018 (0.093 (5.4E-02) ) ) ) ) ) ) ) ) ) 0.008 0.006 0.007 -0.003 -0.001 -0.003 0.003 -0.018 0.000 -0.001 0.007 0.007 Std Err R2 (5.6E-04) (0.0002 (3.2E-04) (0.005 (0.008 (0.008 (0.003 (0.024 (2.6E-04) (0.010 (0.018 (6.9E-04) ) ) ) ) ) ) ) ) I .013* .018* .033* .000 .000 .000 .000 .000 .000 .000 .000 .008* 13.8 26.1 22.5 -.5 -.2 .02 1.1 - - - .7 .9 . 1 .3 10.6 Important observations from the simple paired regression analysis are summarized in the following paragraphs. Very few sets of pairs result in Those that for these GTSCR, OAFQTP, and PQSCR. do are: these variables have pairs did significant R2 value. a positive slope. a All three of Analysis of residuals normality of residuals show reasonable and did not demonstrate any lack of homoscedasticity The remaining variables negative slope. For each Confidence Interval for the value of have of a low these slope shows value positive or variables, the upper the 95% or lower the slope to be either positive or negative. no observable ascending or descending relationship Thus, can be claimed Using the the simple variable RATE as the regressions results 69 independent variable in in the variables EIMCAT and AFQTP having measurable R2 values and positive slopes. coincide analysis results the expected, As with the simple observations taken of regression from the correlation table. When considered one at time, a handful of variables demonstrating promotion the with there appear to be only reportable relationship a variables. a The low R2 value for each regression indicates either a large proportion of pure error, unexplained variance due to other explanatory or significant variables not being included. 3 3-D Empirical Density Plots . empirical density Three dimensional plots were used check for distribution changes in the continuous to visually variables within the subcategories of SEX, PAYGD and RACETH. Two such plots will be discussed because they depict visually data characteristics These identified in characteristics restrictions by earlier tabular results. application the were: congressional mandate in 1980, of and AFQT the differences in OAFQT scores across racial groups. depicted in Figure 4.14, where empirical densities for OAFQT are plotted for each paygrade. The AFQT Observing the restriction is three densities shows that only the paygrade distribution contains scores less than twenty. makes sense, prior to 1980. plot is that considering that Another high all the interesting OAFQT scores 70 E-7 This E-7 enlistments were observation become from this more dominant as paygrade increases. E-7 density This is most to either the E-5 or E-6. of OAFQT across the tends to apparent in three paygrades manifest itself that a low AFQT score in the is, This shift in density suggests that attrition lower AFQT caetgories, but itself, in comparing the prohibitive in not achieving senior enlisted rank. The second 3-D empirical density plot. Figure 4.15, shows differences the subcategories. renormed in A distribution of AFQT scores across racial large discrepancy between the white and the black hispanic or races easily seen, is although Indians have a similar AFQT to that of whites. observation coincides promotion rates However, to races Daula, tRef. inferences about require further ll:pp. 7-10] racial different occurrence the of different between different racial categories as well. make would with This groups policy among promotion research. As pointed out by attrition the shifts pattern among averages the for both promotion rate and AFQT among the races over time. Since the of prediction, it is more account for it in the purpose of this thesis important to identify the model. An explanation one is effect and as to the cause of this phenomenon does not appear to be easily obtained from the thesis data. What is important about this demonstrates the OAFQT is a plot correlation between significant determiner of RACETH will be an important covariate. 71 is that RACETH and promotion it visually OAFQT. If rate, then 3-D EMPIRICAL DENSITY PLOT OAFQT BY PAYGD ^-^ Figure 4 . o^f ^"^^ 14 3-D EMPIRICAL DENSITY PLOT OAFQT BY RACETH 0^ ao^ 0^^-^^ Figure 4.15 D. MULTIVARIATE GRAPHICAL ANALYSIS analysis consisted Multivariate graphical Draftsman Plots relationships when consideration. procedures, the Coded and CRef more . Coded than Scatter Plots two dimensions to 135-139] 12:pp. Scatterplot, 72 of the use of will One be look for were under of these utilized to — demonstrate significant a characteristic being to CMF and PRA, Coded that of SEX, correspondent in Figure 4.16. effects variables as involved third a independent variable the PRA characteristic, the distribution Scatterplots In Figure 4.16, data delineating dimension, against a while of the plotting an dependent promotion variable. CMF values were Jittered and variable, and one plotted against the plot points were coded as periods for males and the letter F for females. CODED SCATTERPLOT PRA VS CMF WITH SEX T 2 1.1 .: -U a. *? '''ft Fi- r J eo L J 20 40 L I 80 CMF Figure 4.16 Figure 4.16 demonstrates personnel the in technically oriented corresponds to found in upper Table . highex- density of female CMF range, which contains the more management career the CMF-SEX X. the fields. This correlation coefficient of 0.258 Likev/ise, the distribution of both the female and male PRA scores are symmetric about the zero line. 73 . corresponds This the to value zero for correlation coefficient also found in Table PRA-SEX the X. LINEAR MODELS E. 1 . Analysis of Variance One ANOVA Way intermediate step detect defining in ANOVA's usefulness been has differences used was means in variables. For example, and EIMCAT as the an as among as an inference model final a thesis this in investigative tool to classes of explanatory using PRA as the dependent variable independent variable, One-Way ANOVA will compare and test the equality of the average PRA score across eight the levels through eight. eight all alternate EIMCAT, of In the testing, category mental hypothesis hypothesis is that the null means are equal, while the PRA that is mental categories one i.e., they are test The not. statistic used to reject or accept the null hypothesis is the F statistic. rejection of exists As such, the null significant a large F hypothesis would differences and subsequent value, indicate that there between the means of the promotion scores for some of the eight mental categories. In general, a large F value can be considered to be any computed F statistic greater than 3,8, for a one degree of differences could be freedom a the asymptotic 95 percent point model. The nature of these large discrepancy between a simple 74 pair of categories, categories, small combination any or discrepancies between difference conditions. of Thus, ANOVA has limited value in discerning the magnitude of the differences does identify location and between category means, but it differences if all eight exist and strong those how differences are. Table XIII for separate tabulates One-Way twelve by three matrix of results a ANOVA The 's. rows the twelve are explanatory variables and the columns are the three promotion variables. Using three all independent variable promotion allowed for a measures as the check of ANOVA values and trends across those measures. In addition to the results of the F is reported. This of levels, a single all variables Further, because of variable categories, of a set With had some level of R2 reported. and hence, computation, the values is because the continuous variable. increased the This independent variable as considers the rather than One-Way ANOVA, value of R2 a R2 value is different than that reported in the simple linear regression model. ANOVA procedure test, R2 informational value of more degrees of freedom for increased above the simple regression reported values. It should be noted that technically, when the defined into ANOVA, continuous variables were put grouped, and discrete. their values were then the variables were treated as if they were Because the SAS 75 software and computational could handle resources used all the integer values for the score ranges of AFQTP and the other continuous possible was gain to insight into the variables, it existence of differences between individual score cells. nonparametric Additionally, evaluate relationships. the procedures CRef. the F statistic for to The the variables and testing the hypothesis of Having agreement equal level means. used 250-2553 13:pp. nonparametric ANOVAs utilized the ranks of also yielded were between the parametric and nonparametric values removed the need of having to pursue confirmation of also allow assumptions for analysis of parametric ANOVA. results It will to focus on the resultant values of F and R2 tabulated in Table XIII. TABLE XIII Variable PRATE F SEX^ CMF» One-Way Anova Summary 5.9 RACETH 35. 90. PAYGD' 6292. GTSCR AFQTP OAFQTP EIMCAT HIYRED EDLVL NCOE PQSCR 18. 32. 36. 37. 96. 37. 156. 1.9 RATE R2 .00016 .02788 .01177 .24953 .04250 .07046 .08441 .01076 .02950 .01076 .05097 .00375 F 13.3 93.3 165.0 0.0 13.4 20.6 25.3 71.5 106.0 71.5 76.4 6.6 R2 .00351 .07415 .02133 .00000 .03184 .04623 .06101 .02035 .03272 .02035 .02499 .01341 PRA F 48.4 0.0 80.0 0.0 10.9 17.3 19. 96.9 117. 96.9 46.8 5.8 R2 .00128 .00000 .01049 .00000 .02636 .03908 .04657 .02739 .03590 .02739 .01583 .01181 ^The Pr>F (1 evel of rejection of the null hypothesis of no d ifference in means) was .0145 for PRATE, .0003 for RATE an d .0001 for PRA. 2The Pr>F for PRA is 1.0. Pr>F for RATE is 1.0, and for PRA is 1.0 3 The Values of Pr>F for the remainder of the table were .0001. 76 Review Table XIII demonstrates some anticipated the of results, which are summarized in the following paragraphs. Since the variables PAYGD and CMF were controlled derivation the of there PRA, relationship between those variables variable. for in correspondingly is and the no PRA promotion Likewise, the variable PAYGD was controlled for in the derivation of RATE, and there was no demonstrated for that statistic and R2 for The pair. those linear relationship values for the zero variable F combinations documents this fact. Using RATE or PRA as the dependent variable, and allowing for only one, most significant variable each of same set to be selected from the intelligence and academic groups, results in the of explanatory correlation analysis. variables These variables were: GTSCR, PQSCR, RACETH, NCOE, variables were the ones which had the larger R2 value. in HIYRED, OAFQTP, most significant The and SEX. found were as F statistic, and there are This set is not ordered, however, since differences in order between the PRA and RATE models. Another interesting development from ANOVA results when the explanatory variable mean and variance for each level are plotted against the promotion variable. analytical plot, but it does provide on the This not a standard some visual information size, direction, and dispersion about the center line of an independent discrete variable. This plot similar to a strip box plot for continuous variables. 77 is most example An plotted against the sum Figure 4.17. shown in each individual's PRA score was where plot EIMCAT and of his HIYRED score is Figure 4.17 the two center lines In scores plotted represent the sum of for EIMCAT and HIYRED seperated between the GED qualified personnel and High School The outside two lines trace the Diploma Qualified personnel. lower and upper bounds deviation standard one from the computed means. X-Y PLOT OF MEANS AND VARIANCES PRA VS HIYRED + EIMCAT UPPER BOUND LOWER BOUND J I s 12 20 16 EIMCAT + HIYRED Figure By plotting a 4 . 17 separate line for each high school diploma category it can be seen that while both groups have a increase in promotion rate, as the of EIMCAT and HIYRED consistently school increased, a the GED combined level qualified fixed level lower than graduate. Thus, a fully similar personnel were qualified high the additional merit of an actual 78 high school diploma did manifest itself in promotion A final ANOVA involves specifying look at the set of the seven most checking then and rate. model using a significant independent variables, interactions among them. for Table XIV gives the results of the Seven-Way ANOVA using this model: RATE 7 Main Effects = Table depicts XIV individually in Two Way Interactions + seven the significant most Effects rows, the Main variables and the interaction terms in the Interactions rows. The advantage of this Seven-Way ANOVA is that inclusion of all of the explanatory variables simultaneously allows for comparison of the significance variables relative each of Additionally, specifying others. to the the explanatory of combinations of two-way interactions checks to see if any two of the explanatory variables are significantly related to one another. term. An example of an interaction would be a SEX and CMP female personnel tend to As has been previously shown, be associated with higher CMP values. If the ANOVA model for which was the product of the two promotion included a term two values, SEX*CMP, then the considered in found to entries the ANOVA model. be significant, for CMP and attributes SEX If the then the would would be jointly interaction term was two individual variables be removed and only the interaction term retained. An additional consideration in 79 the Seven Way ANOVA was that the were some combinations of the have any factor degrees of freedom which levels entries in the ANOVA cells. be seen in the SEX*OAFQT term. 76 Unbalanced means that there unbalanced. model was An example of this can Specifically, there are only for the interaction term, while the individual degrees of freedom for SEX and OAFQT are respectively. Thus, the combinations without entries. SEX*OAFQT As a computed will be only approximate. step in analysis was did not term result, the 1 had F and 79 three statistic Since the purpose of this exploratory, the F statistic estimates were considered adequate. Table XIV RATE as the presents the results of a Seven Way ANOVA using dependant variable. Similar obtained using PRA as the dependant variable. 80 results were , TABLE XIV 7-Way Analysis of Variance with Interaction DEPENDENT VARIABLE: RATE SOURCE DF SSQ MEAN SQUARE MODEL 14966 18869.39 1.260818 ERROR 22887 18981.65 0.829364 CORRECTED TOTAL 37853 37851.04 SOURCE DF Main Effects RACETH 5 SEX 1 OAFQT 79 HIYRED 12 GTSCR 93 NCOE 13 PQSCR 78 Interactions RACETH*SEX 5 SEX*OAFQT 76 SEX*HIYRED 9 SEX*GTSCR 72 SEX*NCOE 11 SEX*PQSCR 70 RACETH*OAFQT 335 RACETH*HIYRED 46 RACETH*GTSCR 326 RACETH*NCOE 46 RACETH*PQSCR 288 OAFQT*HIYRED 593 OAFQT*GTSCR 2864 OAFQT*NCOE 614 OAFQT*PQSCR 3631 HIYRED*GTSCR 564 HIYRED*NCOE 88 HIYRED*PQSCR 518 GTSCR*NCOE 604 GTSCR*PQSCR 3383 NCOE*PQSCR 542 Three important XIV. F ANOVA SS VALUE PR 1.52 F VALUE 194.69 16.02 25.50 124.42 15.63 87.73 7.85 0.00 440.59 66.03 72.80 57.76 53.06 0.00 107.84 0.00 8.41 104.24 112.62 2418.55 954.24 3182.33 130.88 276.98 0.00 6.99 8.85 484. 13 observations can R2 F 0.49852 ROOT MSE 0.91069421 807.35 13.28 1670.54 1238.25 1205.22 945.89 507.52 718.86 2997.93 504.44 > 0.0001 1 .87 1.06 0.28 3.80 1.13 1.44 1 .0001 .0001 .0001 .0001 .0001 .0001 .0001 1 .0000 .0001 0001 .22 6.33 0.91 0.00 2.83 0.00 0.22 0.44 0.23 1.02 1 PR .07 1.12 1 1. 1. 1. 1. 0. 0. 0. 1 0. 0. 0. 0. 0. 0999 0001 6795 0000 0001 0000 0000 0000 0000 2570 0001 0137 0000 0001 0251 0001 0051 0268 be obtained from Table The first observation is that there are few significant interaction terms. Only those terms marked with an asterisk 81 . . demonstrated statistical significance with the PR Of these, only three had F values greater than level .0001. These interaction terms were OAFQTP, HIYRED, 3.8. the Seven-Way ANOVA correlation model HIYRED with previously was Table matrix. and NCOE, The presence of interation seen in all interacting with SEX. correlated where X, SEX OAFQTP, and observed The implication of having significant in the positively was and (0.05, respectively), and negatively correlated with NCOE, 0.131 (-0.081). interaction terms is would need to be included in any predictive model. that they Thus, F at > identification interactions of using ANOVA was critical effects variables continue to be the main Secondly, all significant, even when used simultaneously by the model. Lastly, selecting the single most significant explanatory variable from and education the academic same unordered best set groups yields the the One-Way as did ANOVA: OAFQTP, HIYRED, GTSCR, NCOE, RACETH, and SEX. In summary, fundamental the result ANOVA was the of confirmation that there are differences in the level means of promotion scores variables, and explanatory due an several to agreement variables as independent to which considered when explanatory were the best separately or simultaneously Also, EIMCAT and plotting the means HIYRED versus and variances PRA demonstrated 82 of the sum of that there was a increasing good linear trend of the level means with PRA. However, there was considerable The choice level. variables was discrete variance of EIMCAT important each class and HIYRED as the explanatory because representatives within from those variables academic the are both aptitude and education groups. 2. ANCOVA The use One-Way of previous section Analysis was primarily Beyond independent variables levels of acknowledging the independent there that available to in the the existence of to confirm significant differences among the variables. Variance of some are explain promotion rates, Seven-Way ANOVA did not provide any numerical measure of the structural form of the variable to analysis of the the contribution of model. [Ref. 14:p. In addition, 10] variables, continuous given independent a nature the in of the variable was changed to represent a discrete valued variable. continuous Incorporating achieved through the intermediate utilizes metric continuous qualitative variables values. The into ANOVA method of ANCOVA. variables as well was ANCOVA as nonmetric result of ANCOVA was an improved multivariate model with the inclusion of continuous variables in their linear proper form. coefficients for ANCOVA the provided estimates of the continuous variables, and reported on the px-oportion of variance accounted for by each 83 variable categorical of variables basis for further removal the set previously identified. considered model The These results provided the well. as [Ref. or interactions from 15: on the results of the based was 343-349] pp. previous chapters and consisted of the following form: Promotion = f ( OAFQTP, PQSCR GTSCR HI YRED, NCOE, RACETH, SEX , plus interaction terms SEX*HIYRED, SEX*GTSCR, SEX*OAFQTP) The variables OAFQT, continuous, HIYRED , PQSCR, NCOE and GTSCR and metric and are discrete and metric, and are RACETH and SEX are discrete and nonmetric. A representation of the model using notation consisted of the following form: Yi = Bo BiXi + is the coefficients PQSCR. for through . D4 . + Ii ... Is is the promotion variable PRA, Yi Bi of the represent the discrete Ii . and Bx through Bs are continuous variables OAFQT, GTSCR and the all levels D2 + + D^ + intercept, linear The coefficients same for NCOE. BsXs + above notation, In the Bo +82X2 through are assumed to other variables. variables I3 Bs RACETH, Di SEX, be the through D« HIYRED, and are the interaction terms OAFQT*SEX, HIYRED*SEX, and NCOE*SEX. This model is also estimates. unbalanced The results of the shown in Table XV. 84 and the F statistics are ANCOVA using this model are TABLE XV ANCOVA with Interactions DEPENDENT VARIABLE: PRA SOURCE DF SSQ MEAN SQUARE F VALUE MODEL 55 2423.68 44.07 47.13 ERROR 37798 35339.29 0.934 CORR 37853 37762.98 TOTAL SOURCE Main Effects OAFQT RACETH SEX HIYRED GTSCR NCOE PQSCR Interactions OAFQT*SEX SEX*HIYRED SEX*NCOE TYPE III SS DF 1 5 1 12 1 13 1 12.89440024 152.10095609 5.31950192 517.91751116 3.65772995 132.83314221 80.15632971 13.79 32.54 5.69 46.16 3.91 10.93 85.73 0.0002 0.0001 0.0171 0.0001 0.0479 0.0001 0.0001 4.03387863 10.16825209 18.42527136 4.31 1.21 1.79 0.0378 0.2844 0.0496 1 11 T FOR HO: main the R2 F PR F PR PARAMETER ESTIMATE PARAMETER=0 INTERCEPT 0.25501 0.31 OAFQT 0.00094 1.26 -1.98 -0.00104897 GTSCR PQSCR 0.00422902 9.26 First, > VALUE 9 There are three PR 0.0001 0.0642 ROOT MSE 0.966 important effects > IT 0.7592 0.2077 0.0479 0.0001 observations variables, > F STD ERROR OF ESTIMATE 0.83191986 0.00074544 0.00053034 0.00045674 Table XV. from with the exception of GTSCR, are still significant in their ability to account for variance in the model. Secondly, no interaction terms are significant. F for these terms are much greater than .0001 and small F value. Thus, the effect of The PR > each has a the interaction terms will be assumed to be negligable. Lastly, the bottom estimates of portion regression of the coefficients 85 ANCOVA for the table lists continuous estimates These variables. were tested, using the T statistic, to see if they were significantly different from hypothesized value of zero. different significantly from estimate the If then zero, was a not explanatory the variable did possess sufficient predictive ability. The PQSCR coefficient has zero. 0.0042, and is significantly different from value of with a The OAFQT variable has magnitude, and zero. positive slope but small, a but a slope with the correct sign is not significantly different from it The GTSCR variable demonstrates a negative slope and again is not significantly different from zero. estimate value, combined with the knowledge The negative that GTSCR is strongly condition correlated multicollinearity of with between Multicollinearity implies that one surrogate predictor [Ref. 15:p. coincident OAFQT . with other the for 4151 OAFQT, the two variables. variable may little Thus, the indicated a or no be simply a effect as inclusion of GTSCR was considered detrimental development of a regression model, and it was to a to the dropped from subsequent analysis. In summary, remaining interaction predictive model. demonstrated and the ANCOVA resulted a The elimination of the consideration estimated values GTSCR, was considered in 86 in the of OAFQT and GTSCR multicollinearity in variable, remaining variables to be from terms condition of weaker in the the model, eliminated. The subsequent analysis . . OAFQT, PQSCR, were: HIYRED, NCOE, RACETH, and SEX. results were considered satisfactory, variable set professional education, testing, contains single well as the remaining measures of academic aptitude, education, performance military categorical two as in that These variables: SEX and RACETH. 3. The Final Model; Regression was important The Multiple Regression (ANCOVA) Background a. variables A coefficient the final result values analysis of which variables. independent influence a reduced set of step in successive data analyses. this analysis estimated statements about the independent explanatory with of OAFQT of of numerical each importance and HIYRED set a qualitative influence specific Of was of the was the in predicting an individual promotion rate. In the development of the regression model this section will: 1 Review the pertinent results which led to the regression model definition. 2. Compare the model using the three promotion rate variables 3. Select 4. Interpret the resulting regression estimates and conduct sensitivity analysis. 5. Check model assumptions and confirm the model using an alternate data set and nonparametric procedures. 6. Test the model by comparing actual versus predicted promotion rates for population subcategories. a single promotion variable for the model. 87 Previous results are reviewed in the following paragraphs. ANOVA demonstrated ANCOVA and the explanatory differences exist between internal levels of variables as Paired plots function of average promotion rates. a scatterplots level the of significant that utilizing smoothing techniques, and means found consistently ANOVA, in demonstrated an ascending linear pattern when plotted against promotion variables. ANOVA and ANCOVA models, using interactions, the elimination variables of additive sufficient linear Further, this model. which effect analysis as to individual the nature identified subsequent analysis, these allow for only the and groups demonstrate not included in the that there was no remaining variables. combined with variables, be to analysis confirmed significant interaction among the Correlation analysis, did resulted in the in-depth univariate scoring groups were strongest unique procedures of the of variables. then In restricted to variable to be entered into the model. The final set of variables for entry into the model are the following: Promotion This model = f is (OAFQT, PQSCR, HI YRED, NCOE, RACETH, SEX) a mixed scale and variable type including both discrete and continuous variables. input variables have nominal scale, RACETH and SEX. model, Two of the To allow for their entry into the model, these values were transformed 88 dummy into receded as a five dummy variables. Specifically, the variable SEX was 0/1 variable, while RACETH was 0/1 variables: the RACETH score of every for 1 1, For example, . for the dummy variable Dl was coded with a entry 1 Dl through D5 represented with and zero a for all others. This procedure was applied for the next four levels, while score was left as a 0/0 entry. After application [Ref. 332-341] 15:pp. the receding just described, of 6 the regression model can be defined with the notation: Yi In = Bo + Bi Xi * above the variables. B2X2 B3X3 notation, the is Bo + + B«X4 linear and are B4 coefficients ... + + the of intercept, and coefficients for the continuous Bs Dl one is Yi + Bi variables OAFQT, Ds + D» promotion and Ba are and PQSCR. the discrete and ordinal for variables HIYRED and NCOE. Di through Ds represent the dummy variables for De represents the dummy variable RACETH, and for SEX. The data set of two separate provided for data a was randomly 37,854 records files different regression coefficients regression for data set from the to analysis. confirm first set. split into This analysis of Paragraph e.l. of this section compares resulting regression coefficients of the model using the second data set. b. Results Table XVI the lists 89 regression results of the When computing basic model variables. effects variables RATE the reintroduced respectively. into the allowed This coefficients and R2 value became more restricted. the ANOVA results of the statistic. explanatory of comparison for variables variable of the dependent variable changes as In Table XVI the top paragraph shows model reports and the F and R2 Each column then gives the regression results of each promotion rate model, including of the then CMF and PAYGD were CMP and set PRATE and models for strength of rejection for for the estimate value. Values a Pr>T value a null of Pr>T as measure hypothesis of zero less than .05 are considered acceptable for consideration of that variable. 90 TABLE XVI Added Variables ANOVA F Pr>F PRATE PAYGD 1317.4 RATE CMF 360.3 PRA None 218.5 .0001 .3116 .0001 .0948 .0001 .0546 0.022222 -1.03692 .055368) CMF, R2 Intercept (std error) Pr>T OAFQT (std error) Pr>T HIYRED (std error) Pr>T PQSCR (std error) Pr>T SEX (std error) Pr>T NCOE (std error) Pr>T Dl (RACETH) (std error) Pr>T D2 (RACETH) (std error) Pr>T D3 (RACETH) (std error) Pr>T D4 (RACETH) (std error) Pr>T D5 (RACETH) (std error) Pr>T CMF (std error) Pr>T D7 (PAYGD) (Std error) Pr>T D8 (PAYGD) (std error) Pr>T Regression Results .002558) .0001 .0001355 (00000871) .0001 .0005341 .000152) .0001 .000089 .000014) .0001 - .0008582 .00050325) .088* .00008839 .00000625) 1573* .0026347 .0011286) .0196 .0037888 .0011266) .0008 .0009404 .001279) .4623* .00028892 .0032534) .3745* ( ( ( ( ( ( . ( - ( - ( ( -.000224 .0018127) ( -.000147 .0000052) ( ( .0001 .148352 .004851) .0001 .001608 .000449) .0001 .022904 .01562) 1427* .012688 .0017808) ,0001 .053088 .035653) .1365* -.096320 .035570) .0068 -.0239592 .040383) .5530* .089059 .102707) .3859* -.021530 .0572261) .7067* -.0053672 .0001654) .0001 NA ( . ( ( ( .01497054 .0363905) .6808* -0 .0898693 .0363089) - ( ( .0013 .0417668 .04122033) .3109* .01007473 1048355) .9234* .0138649 .058409) .8124* NA . - ( .0001 ( .0058817 .0002444) ( .9016* .05600) .0001 .0042608 .0002492) .0001 .139484 .0049298) .0001 .00327211 .0004583) .0001 .0564079 .0155310) .0003 .0073740 .0017949) .0001 ,0001 ( ( -1 .28822 ( .060127 .0017904) .0001 .017999 .001774) NA .0001 91 NA NA . the regression Observations from table are summarized in the following paragraphs. The variables input maintained positive a HIYRED, OAFQT, PQSCR and statistically and all significant coefficient value across all three dependent variables. The inclusion significantly PAYGD of increased Conversely, the with the R2 value the influence of variable PRATE the of OAFQT, HIYRED, model. PQSCR, and the other explanatory variables was severely diminished. model is very similar to the PRA model, and has The RATE generally larger estimate values and the estimates for RACETH higher R2 a . However, and SEX did not have significant T values although The PRA model, generally smaller result for SEX. less nominal has fewer, having lower a estimate values, value and had an acceptable T test Additionally, the PRA model explanatory variable, CMF, and more R2 reliable nominal contained one The PRA model then, explanatory variables. Since the objective of the study was to focus on academic and educational measures model was chosen predictors as as Subsequent analysis of of promotion, the PRA most effective predictive model. the regression coefficient results were regression coefficients conducted with the PRA model, c. Interpretation Interpretation will include two points. of the First, 92 the explanatory variables . which effect can variable will the identified. be demonstrate the greatest amount variable required change Secondly, change of in the in a dependent an example will given explanatory to achieve a five percent shift in the PRA estimate The amount of change in PRA caused by of an explanatory variable regression coefficients. that an change of one unit be read directly from the However, the total amount of change explanatory variable can cause in PRA depends on the range of the ordered can a explanatory listing the of variable. Table explanatory variables, categorical variables, from most to least measured by Net Possible Change. simply the number of units in XVII gives an excluding total influence as The net possible change is of the explanatory the range variable multiplied by the coefficient estimate. TABLE XVII Variable Ne t Possi ble Change by Explanatory Variable Estimate Range HIYRED OAFQT PQSCR NCOE 1-12 1-99 21-100 0-14 In a qualitative sense, explanatory variable number of can Net Possible Change .13948378 .00426083 .00327212 .00737408 1.6738 0.4218 0.2585 0.1106 the sensitivity of PRA to each demonstrated be by deriving the explanatory variable units needed to move from the median PRA value up five percent. To compute the average for average value for PRA, variable was each explanatory 93 the population entered into the The resulting PRA value was 0.0185, which, regression model. normal approximation, using the of the PRA distribution. standard Using the distribution, PRA percentile was 0.1434. variable explanatory variable a Checking the consisted of increase produce a of Alternatively, if the its to sensitivity changing number of average. explanatory percent 5 the PRA 55.7 of each single a units to result while holding all other explanatory variables at the population the approximate to corresponding value sufficient in a PRA value of 0.1434, percent would 5 the 55.7 percentile. lie at tables normal the explanatory An upward shift of value to the PRA then require lies at the 50.7 percentile Table XVIII tabulates variable upward shift amount required percentile was not possible within the necessary to units percentile. PRA in reach the 55.7 to range the input of variable, the maximum amount of available change was listed. TABLE XVIII Variable S ensitivity of PRA to Expl anatory Variables A verage Chanqe to HIYRED OAFQT NCOE PQSCR Value Pra Chanqe 55.9 55.7 54.0 53.4 7.0 74.0 14.0* 99.0* 6.01 45.3 3.06 78.4 % *max value Interpretation of demonstates that HIYRED variable. This coefficient the is observation structure of the variable most the is clearly important explanatory understandable is discrete, 94 values and that since the changes to adjacent values background. value of represents major distinctions in educational The example of shifting from seven, represents value of six to a the difference of having a high school degree versus having gone to one year of college. percentages of HIYRED, constitutes moving from that center group of high school qualified a NCO's, to a In large the upper ninety percent of the HIYRED distribution. OAFQT is the second most significant explanatory variable. A shift of roughly one quarter of its can change PRA plus or minus explanatory variables NCOE and range, five 45 to 75, i.e. The other percent. considerably less PQSCR have influence on the dependent variable, d. Checking of Assumptions To model, verify requirements the performed residual analylsis was program. Representative plots of for the regression using the Grafstat the OAFQT residual are shown in Figures 4.18 and 4.19. REGRESSION REDISUAL HISTOGRAM REGRESSION RESIDUAL SCATTER PLOT (N=5C0) « Figure 60 OAFQTP res Figure 4.18 95 4 . 19 100 histogram The residuals, of shown Figure in the residual distribution is approximately demonstrates that Homoscedasticity is checked in Figure 4.19, in which normal. residuals have been plotted against OAFQT variable. the There does not appear to be any patterns in the plots of the the uniform pattern was considered sufficient residuals, and to justify the assumption of homoscedasticity. each 4.18, observation represents Lastly, since different a independence of each observation from one person, the another is assumed true. Confirmation of Regression Findings e. Second Data (1) conducted on the comparison of Set Regression analysis was . partition second the of data set. A those results with the first data set is shown in Table XIX. TABLE XIX Comparison of Regression Data Sets Independent Variable PRA 2nd Set 1st Set Coeff Std Err Estimator OAFQT .004260 (.00025) (.00493) HIYRED .139483 (.00046) PQSCR .003272 The above results are Std Err Coeff (.00032) (.00636) (.00060) .004729 .131559 .003197 felt to be sufficiently comparable to accept the original model coefficient scores. (2) Nonparametric Regression contained an ordinal variable, using nonparametric terms was 96 . Since the model a regression result included as a confirmatory HIYRED, . measure. Nonparametric regression least squares approximation for produced the same linear the model estimates, so the regression coefficient for HIYRED was still 0.1395. for nonparametric estimate the regression the value coefficient. used regression The test for the acceptance of Spearman the However, correlation rank coefficient HIYRED was for tested using this procedure. First, for each value of PRA and HIYRED U was found by computing U PRA = - (0.1395 a predicted value * HIYRED). Then, the Spearman rank correlation coefficient, rho, was computed, based on the ranks of HIYRED found to be 0.02482 with the null hypothesis coefficient was regression. [Ref. hypothesis, that a was equal to and the Pr> R I the of 0.0001. value 0.1395, regression U. regression value the It was In this test the of 265-271] 13:pp. the I ranks of found in test the null To coefficient estimate is correct, rho was compared against a rejection region computed using the tailed two approximation. The Spearman rejection Quantile, regions with for a normal this Spearman Correlation parameter were values less than 0.0085 or greater than Since 0.9915. the either rejection region, the rejected, and a HIYRED value of rho did not fall inside null hypothesis regression coefficient acceptable 97 could not be of .1395 was . Testing the Model f. coefficients founc3 The mocfel tested in two ways. by regression were First, a predicted promotion rate value was computed for the extremes and average of the model extreme values used input variables. The the minimum or maximum values for the average promotion rate using sample averages for all input variables. predictions were then be The . compared against was computed The resulting actual the distribution percentiles. subsets Secondly, of the sample population had average promotion rates predicted using categorical values and sample population averages. actual sample against the for PRA The resulting predictions are compared were found by Again percentile values values. using a standard normal table approximation TABLE XX Comparison of Extreme and Average Predictions Model Minimum Prediction PRA Value -1.0009 (.1000) Data Sample Percentile Percentile 15.7% PRA Value -1.558 Sample Percentile Percentile 89.1% PRA Value 1.7866 Percentile 95% (9.9%) Sample Percentile Average Prediction PRA Value 0.01839 (0.223) 5% (3.5%) Maximum Prediction PRA Value 1.23029 (.4098) Percentile PRA Value -0.04146 Percentile 50.7% (8.5%) 98 Percentile 50% . The model predictions were very accurate at the average level, but this accuracy diminished at the extremes. The second population test for subcategories predicted. The the model was one where specific their had subcategories average value PRA represented were four combinations of SEX and the black and white RACETH variables Additionally, predictions were promotion rate of all NCO's with all NCO's with an OAFQT of 85. made a HIYRED set to value of 10, and As in the previous table, unless the input variable is being used as value was check the average to subcategory, its a the overall population average. Table XXI shows the results of the predictions. TABLE XXI Comparison of Predicted vs Actual PRA Averages Sample Sample Size Subcategory Predicted % (Lower-Upper) Male/White 55.1 (45.7-64.2) 53.1 18,003 Male/Black 49.5 (40.3-58.9) 44.3 12, 121 47.7 2,485 59.5 1,842 75.7 969 Female/Black 47. 3 (37.7-56. Female/White % 1) 52.9 (44. 1-61 .5) HIYRED=10 71 .7 (63.5-79.3) 2129 60.2 57.4 (44.7-69.4) *The sample da ta point estimate was averag<3d over a range of OAFQT 80 to 90. 0AFQT=85* 99 Testing of the regression effective reasonably model with used if nominal variables, such as indicates input that it was changes of the Changes in the RACETH. SEX and value of HIYRED produces reliable estimates, and demonstrated the considerable contribution of this variable as a predictor The continuous variable OAFQT is difficult to test; of PRA. since it is taken over continuous a a variance the OAFQT does median. the estimate was model Predicted results are close to range of values. the sample value, but spans the variable of the estimate still move the predicted values of PRA in the right direction, but its effectiveness is severely hampered by an accurate and diminishing ability to provide its variance prediction value extreme. Other prediction OAFQT their results and predictive ability g. approaches either estimates were attempted using demonstrated the lack same of away from the center percentiles, Summary of Regression Analysis Regression independent PRA as analysis contribution predicting a promotion intellgence aptitude, of provided several They rate. estimates key include of the variables a to measure of OAFQTP, a measure of academic ability, HIYRED, two measures of military performance, PQSCR and NCOE, and two nominal values SEX and RACETH. Testing of these estimates shows that the predictive ability of the model is limited to those variables which have very distinct abilities to 100 subcategorize the sample . population. variables. These variables are the SEX, RACETH, The continuous variables for OAFQT, PQSCR, cannot be relied upon to independently yield can affect and HIYRED limited shifts of the estimates of PRA, but PRA distribution within a subcategory E. SUMMARY OF FINDINGS Chapter IV was the principal analytical study. It progressed and resulted in an independent set through ascending inferential model exercise in this stages of analysis with a These explanatory of explanatory variables. variables did, in fact, rely on levels of restricted and intellegence tests and academic background as values to predict promotion. The model, however, demonstrated only limited utility as preditive equation. it was describing It could only match the sample data when an population subcategory. change in the a average promotion rate among This would occur variable explanatory had only a a large where the significant partitioning effect on the population. The next two chapters will investigate the relationship of intelligence and academic ability as a rate but through different procedures. 101 predictor of promotion ANALYSIS OF TOP PERFORMERS V. A. INTRODUCTION This chapter took an distinguish which trends approach hoc ad performers^ top promotion rate, from their peers. three top the percent consist of population, the of scores. referred to set, TOP data on the basis of Top performers individuals, according to PRA as the identify any to This or data 1,047 set was while the remainder were sections. The first section referred to as the SAMPLE data set. Analysis consists of three is a shown comparative tabulation of means and variances. in this section characteristics predicted EIMCAT and OAFQT scores. with respect Those sections of were, discrepancies this of sample Chapter IV., such as higher in There majority the however, discrepancies distribution values of RACETH, NCOE and to TOP PAYGD. confirmed Results chapter. are investigated in later The second section reports the results of formal hypothesis testing for differences in means of the explanatory variables. between each investigates the discrepancies associated and PAYGD. Through a appears to with RACETH, NCOE, presentation of graphics demonstrating internal shifts of those which The last section variable interrelate discrepancies is identified. 102 distributions, the three an effect distributional B. COMPARISON OF MEANS AND VARIANCE The tabulated means and variances of the study variables for the top three percent and for the remainder of the entire sample are presented in Table XXII. table shows the percentage last column The and direction in the that the TOP data set differed from the SAMPLE. FABLE XXII Variable /Type Promotion Mean RATE PRATE PRA Top vs Sample S ummary Data Std Dev 2.06 178 2.33 .392 ,037 .350 64.69 61.60 6.11 113.17 6.88 22.01 23.24 1.31 14.70 1.59 1.55 11.31 2.50 . Comment S ample T op 3% Mean Std Dev 0.00 1.00 .036 1.00 .109 0.00 Intelliq<Bnce AFQTP OAFQTP EIMCAT GTSCR HIYRED EDLVL PQSCR NCOE Effects SEX CMF RACETH PAYGD 7. 12 80.57 2.31 1.18 62.09 1.58 5.19 53.4 45.3 5.07 108.3 6.01 6.32 78.4 3.06 1.12 51.9 1.65 5.27 .390 27.146 Observations derived .975 .405 from the 20.9 24.7 1.28 14.2 1.07 .97 1.6 2.81 .328 31.3 data in .942 .464 Top Top Top Top Top Top Top Top 17.5% 26.4% 17.0% 4.1% 12.6% 11.2% 2.6% > 33% < Top Top Top Top > > > > > > 5% > 16% 4% 3% > < < Table XXII can be summarized as follows: The four aptitude test variables, GTSCR, AFQTP OAFQTP and a strong positive difference between the TOP and SAMPLE scores. The AFQT related scores are about EIMCAT, all demonstrate twenty percent greater, with GTSCR greater by four percent. 103 . The variables, EDLVL and HIYRED, were both positive, with HIYRED slightly larger twelve at PQSCR increased percent, slightly. variables SEX The effects and CMP both increased, with CMF demonstrating a significant increase. was an unexpected result The change subsetting of The PRA variable was designed to percent. in CMF to the top three be independent of and it should not have been affected as significantly as CMF, it was The only variables which decreased in SAMPLE and TOP were NCOE was the had a NCOE, RACETH, and PAYGD. largest. unexpected result. change The Regression is the NCOE in Of the three, was also an analysis indicated that NCOE positive influence on PRA. top performers proportion between To have NCOE reverse result. decrease with Paragraph D of this section will attempt to explain the reason for this anomaly. SIGNIFICANCE TESTING C. Significance testing for means of the explanatory variables between the TOP and SAMPLE data set was included as a formal statistical confirmation of differences two data sets. Testing utilized since the study if continuous, test for test used a using between the nonparametric methods was variables were either discrete, or did not meet the Kolmogorov-Smirnov one-sample normal distribution. The type of nonparametric is dependent on the type scale of the variable and whether it was continuous or discrete. 104 TABLE XXIII Variabi e Top vs Sample Hypothesis Test Used Intell iqence GTSCR Kruskal-Wallis Test R esults Resu Its ^ Chisq " 671 AFQTP Kruskal-Wallis Test Chisq = 1165 OAFQTP Kruskal-Wallis Test Chisq = 1418 EIMCAT 2XC Contingency Table* Chisq '- 503 HIYRED 2XC Contingency Table Chisq - 931 EDLVL 2XC Contingency Table Chisq ~ 700 POSCR NCOE Kruskal-Wallis Test 2 X C Contingency Table Chisq - 26.1 Effects SEX CMF 2 2 * RACETH PAYGD 2 2 « « ' C C Contingency Table Contingency Table Chisq Chisq - C C Contingency Table Contingency Table Chisq Chisq = " " Strongly reject HO: Strongly reject HO: Strongly reject HO: Strongly reject HO: Strongly reject HO: Strongly reject HO: Reject HO: Strongly reject HO: Reject HO: Strongly reject HO: hypothesis is that For this nonparametric test the null The alternate hypothesis is the populations are identical. With that one of the populations yields larger observations. Mann-Whitney test. two populations this is equivalent to a .95 the critical Chisquare value for level a of At a rejection is Chisq > 3.84. ^ 2For this nonparametric test the null hypothesis is that the two populations have the same distribution as measured by the probability of falling into one of the discrete variable that the is The alternate hypothesis classifications. The contingency table is set distributions are different. > 1.93 and for the two rows to be the classification of PRA PRA < 1.93, the C represents the number of discrete levels in The Chisquare test statistic is the variable being tested. also used for this test with a rejection of HO: when Chisq is larger than 3.84 at a .95 level a. 105 testing Hypothesis simple means strength of and the confirms variances of difference the observations the study variables. can made on interpretated be The by the magnitude of the Chi-square statistic. ANALYSIS OF DISTRIBUTIONS D. This section investigates further distributions for those variables shifts the which conflicted in with the relationships derived in regression and correlation analysis. Those variables were NCOE CMF, PAYGD and Again, the . conflicts which arose were two-fold. First/ neither CMF or PAYGD should have been affected by subsetting of the PRA variable. The PRA scores are normalized differences from the average score for every paygrade and CMF combination. Assuming a policy then, no one CMF or a uniform should have of promotion paygrade should have dominated as result of subsetting to the NCOE application top three percent. Secondly, slightly rather than decreased increased significantly by subsetting to the top three percent. The three inconsistencies appear distributional change. to be linked in their Observation of the three Figures 5.1, 5.2, and 5.3. demonstrate this. 106 1 TOP VERSUS SAMPLE CMF CHANGES LU 4. IN PERCENT i < o .^JZi izrv ^^^ ^ jziZZ. 21 -4 - I -a L 16 11 23 64 54 29 74 92 81 95 CMF Figure Figure 5.1 demonstrates CMF percentages away service support MOS and Armor MOS ' s ' s . lost 5 . clearly defined a combat arms MOS from particular In total a redistribution of ' s to the combat Infantry, Artillery, of 15.5 percent, Administrative Specialists (CMF 71) gained almost 9 while the percent. TOP VS SAk/PLE NCOE CLUSTER BAR 40 30 UJ ^20 ca Q. H 10 1 m m 2 3 4 HS. 5 A 6. 7 TOP SAMPLE ^ 1011 rN B 9 NCOE (1-11) Figure 5.2 Figure 5.2 demonstrates transfer of 107 a large percentage of 3 density away from the NCOE 7 to the NCOE the sample This was consistent with observations the Figure 5.1, in combat arms NCO's qualify for level since only level. 7, the Combat Arms Primary Leadership course. TOP VS SAMPLE PAYGD CLUSTER BAR 80 60 I- z u o m a g40 CL TOP SAMPLE 20 E-5 E-7 Figure Figure The last figure. percentage from 5 . a displacement of E-5 paygrade as a result of to the the E-6 shows 5.3, extracting only the top three percent by measvire of promotion rate . To offer explanation an these discrepancies is discrepancy may effects normalizing adequate. by well The mathematical error. interrelationships of difficult. Some explained be the PRA observed However, do the underlying reason for act scores can be consistently. 108 of this in that the removal of discrepancy it measure was may noted not entirely be simple that their Specifically, the reduction paygrade in significantly reduce likely that change combat and the NCOE NCOE in MOS's both combine to level. As such, it is more occured coincident changes in the two variables PAYGD and CMF. demonstrated was where one junior with the The effect being service support combat NCO's were dominating promotion achievement. E. SUMMARY OF FINDINGS Comparing the changes in averages for the top performers to the regression coefficients very substantial agreement. found Chapter in IV, Specifically, OAFQT was the most significant intelligence test variable, while HIYRED significant most change in OAFQT variable. academic greater is HIYRED, than than that OAFQTP of PQSCR, SEX, and regression RACETH Thus, should shifted each it still has the predictive more pronounced be less significant variables of The . was the Although the percent considerably more variance than HIYRED. ability of HIYRED in shows a small, significant amount in the appropriate direction. The only discrepancy between change in the variable been induced This change NCOE. by changes the two procedures is the is felt to have in the CMF and PAYGD distributions. The effect is one where junior combat service support NCO's replace NCO's from the combat MOS's. An important observation from analysis of the top three percent was that the increase in the value of any explanatory variable was not extreme. In fact, 109 the largest increase was only twenty-five percent. NCO's who rather than do a little much better inference, As an better in a appears that in a combination of areas, single area, recipients of faster promotion rates. 110 it are more likely i PRINCIPAL COMPONENTS AND FACTOR ANALYSIS VI. INTRODUCTION A. chapter more advanced statistical procedures are In this implemented to better improve and or Principal components factor and independent variables, the simplify the cause-effect model. least at related procedures summarize analysis normally used in investigating which are the mutual relationships and communalities of variables. of reduce to large number a identifying redundant variables, and by By variables constructing composite possible two closely are it is of independent explanatory number the originals, the of variables to only those which are significant and unique. THEORY B. Principal components and factor analysis each algebra operate to on a P use matrix by P matrix of correlation or covariance coefficients and produce a system of eigenvectors of the form: Y< 3 ) = ai J Xj +323X2 + resultant represents the linear combination of the ..apjXp + composite In the notation, E. variable loading coefficients, loading coefficients multiply each of the Xo , n=l..p. accounted resulting E by represents the the linear eigenvectors which 3 j is the . These original variables amount of residual error not model. CRef. represent 111 at Yj 328] 5:p. a set of The orthogonal components jointly perpendicular in the space of the original variables. [Ref. 15;p. These components are jointly 4243 uncorrelated and individually account for levels of variance, first principal component accounts for the largest where the proportion, and the last principal component accounts for the smallest. some characteristic aggregate variables. component may be representative of resulting A example For strong factor loadings a input resulting eigenvector which has variables original for original the of of physical strength and endurance could be called a factor of stamina as an aggregate analysis differ that in require that number of variables initial variance. exists a set of dimension of the and components equal the to account to for the factor method assumes composites in original a factor assume and number of the total that there dimension smaller than the number of variables which will 622] 5:p. . needed is components principal components In contrast, suffice. [Ref Principal measure. An additional aspect of factor analysis is that it allows for rotation of the more unique solution with well-defined and there are five variables in loading factors in the range factors by applying result in a a pattern zero or close to one. the intent components. nonsingular to .4, linear a For example if have intermediate factor which .2 of developing rotation of common transformations may matrix in which the loadings are either The end result is 112 ea Ler to interpret . than factor the numerous with mixed elements Graphical . measures are useful with the rotation procedure and allow the analyst to relative the see uniqueness the of input variables C. RESULTS The SAS procedure for performing factor analysis was used with the method of component method. factor determination being the principal basic such, As component principal analysis was conducted, but limits were applied on the number of factors composite retained that so would factors only the significant The first set of input kept. be most variables included all of the twelve study variables. XXIV shows each component aggregate resulting the is factors which contributed Following Table an factor solution. interpretation represent. XXIII is a factor each of the variables is coded by plot, any Appended below explaining what the The original input variables the factor most to Table a have been underlined. plot. letter. Figure 6.1, where By observing the lack of uniqueness for a group of variables can be noted where the coded letters are close to one another. 113 TABLE XXIV Principal Components Tabular Results Input Matrix of correlation coefficients PRIOR COMMUNALITY ESTIMATES: ONE 2 1 EIGENVALUE DIFFERENCE PROPORTION CUMULATIVE 4.0052 2.2717 0.3338 0.3338 EIGENVALUE DIFFERENCE PROPORTION CUMULATIVE 7 FACTORS 0.5392 0.1892 0.0449 0.9372 WILL BE 8 FACTl EDLVL .4302 AFQTP .9515 EIMCAT .9060 -.0085 NCOE HIYRED .3834 SEX 1735 OAFQT .9518 GTSCR .8238 PQSCR .4001 CMF 1677 PAYGD .1216 RACETH--.3590 - . . Intell Tests 4 3 1.7334 0.2355 0.1445 0.4782 1.0634 0.2138 0.0886 0.6910 10 11 9 0.8496 0.0468 0.0708 0.7625 .5861 -.1133 -.1220 -.4507 .6410 .4212 -.1046 -.1128 -.2413 .5200 -.3467 .3130 Acad FACT3 .5024 -.1195 FACT4 -.2544 .0637 -.0598 .2527 -.3281 .6516 1652 .6668 .4176 -.1113 -.1156 .0090 .1205 -.1449 .6770 .2547 -. .0590 .0331 -.1150 .4985 .3367 .1229 Career Status 0.7542 0.2149 0.0628 0.8922 0.0034 0.0003 1.0000 CRITERION FACTS -.0624 .0075 -.0096 -.0398 -.0637 .1857 - - .0092 -.0464 -.7312 -. 1171 -.1816 .4708 Sex FINAL COMMUNALITY ESTIMATES: TOTAL 114 0.8028 0.0486 0.0669 0.8294 12 0.3500 0.2809 0.1196 0.0690 0.1613 0.1161 0.0292 0.0234 0.0100 0.9663 0.9897 0.9997 RETAINED BY THE NFACTOR FACTOR PATTERN FACT2 7 6 5 1.4979 0.4344 0.1248 0.6031 PQSCR = FACT6 FACT7 -.0693 - .029 .1548 - .024 .1478 .011 .0084 - 134 -.0830 - .124 -.0736 -.550 1535 - .023 .1350 .132 -.4527 .115 -.2587 .561 -.0495 .151 .6507 .216 . . RACE 10.706622 CMF 1 1 PLOT OF FACTOR PATTERN FOR FACTORl AND FACTORS FACTORl B ' 1 G C .9 .H .7 .6 .5 .4 .3 .2 .1 JF -.9- .8-.7- 6-. 5-. 4-. 3-. 2-. A E I F A K . .2 1 -.1 -.2 .3 -.4 -.5 -.6 -.7 -.8 -.9 .3 .4 .5 .6 D7 .8 C .9 T L - R 3 1 EDLVL=A AFQTP=B EIMCAT=C NC0E = D HIYRED=E SEX=F OAFQT=G GTSCR=H PQSCR=I CMF = J PAYGD=K RACETH=L 1 Figure The results appear to significant factor is a measures: AFQTP OAFQTP, HIYRED. . quite reasonable, composite of all the factor consists primarily EDLVL and 6 of academic The fourth factor is SEX and two other nominal fifth, sixth and seventh by single variables, The second performance measures The third factor is composed of NCOE and PAYGD and reflects two closely related paygrade. mental aptitude EIMCAT. and GTSCR, where the most measures dominated by predominantly variables, CMF and a measure of PAYGD. The factors all appear to be dominated PQSCR, RACE, and CMF respectively. 115 . In short, each of the twelve variables is in represented in the five factors, the first five some measure factors accounting seventy over for five percent of the By observing the entry for PROPORTION one can see variance. that the subsequent seven .0668 original .0028 to of the factors each contributed between variance and as such are not major contributors Using the results of the first solution was conducted each of the having the of solution factors the In single variable largest loading factor was selected and the other related variables results second analysis reduced number of input variables. with a initial a that were eliminated. solution, Table XXI shows the and Figure 6.2 shows the Factor Plot. 116 TABLE XXV Reduced Principal Components Tabular Results PRIOR COMMUNALITY ESTIMATES: ONE Input Matrix of correlation coefficients EIGENVALUE DIFFERENCE PROPORTION CUMULATIVE 7 2.1666 0.9602 0.3095 0.3095 1.2063 1.0019 0.8703 0.8049 0.7081 0.2416 0.2044 0.1315 0.06540.09670.4665 0.1723 0.1431 0.1243 0.1150 0.10120.0345 0.4819 0.6250 0.7493 0.8643 0.96551.0000 FACTORS WILL BE RETAINED BY THE NFACTOR CRITERION FACTOR PATTERN FACTl FACT2 FACT3 FACT4 FACT5 FACT6 -.5422 .0221 -.3801 -.1071 NCOE ,6941 .2656 -.5162 -.2443 -.4001 HIYRED .3659 .5302 .3135 SEX .1803 .6532 1514 .6993 .0899 -.1346 -.0412 -.0668 OAFQT .0404 .8945 .0502 .2462 -.0374 -.0492 -.1259 .8592 GTSCR .0154 .3664 - .0613 -.3707 5069 .2537 .7141 -.2648 PQSCR -.1589 RACETH -.4521 .3275 .5799 .2487 .5031 . . Intell Tests Acad NCOE SEX FINAL COMMUNALITY ESTIMATES: TOTAL NCOE HIYRED 1.0000 1.0000 SEX 1.0000 NOAFQT 1.0000 117 GTSCR 1.0000 PQSCR = FACT7 .018 -.004 -.051 -.328 -.328 -.022 .037 Race 7.000000 PQSCR 1.0000 RACETH 1.0000 . PLOT OF FACTOR PATTERN FOR FACTORl AND FACT0R2 FACTORl 1 E.9D .8 .7 .6 F .5 B .4 .3 .2 F A C .1 .9- .8-.7-.6A.5-.4-.3-.2-. .2 .1 1 .3 .4 .5 .6 .7 .8 C .9 T -.1 -.2 -.3 - - R 2 .2 .4 G -.5 -.6 -.7 - .8 -.9 -1 NCOE=A OAFQT-D HIYRED=B SEX=C Figure Restricting the 6.2 input to GTSCR-E PQSCR^F RACETH-G Factor Plot the strongest unique variables results in an almost complete separation into single factors. The only exception is the grouping of GTSCR and OAFQT, D) . This is not both scores the decision models makes suprising from the to considering the sense from GTSCR the Factor well 118 from and composition of same set of tests in the ASVAB. eliminate (E Thus, earlier regression Analysis perspective as . E. SUMMARY OF FINDINGS The application principal of analysis confirmed many of redundancy with the the study choices for unique variables in Chapter IV, and gave a components patterns variables. in the of It and factor dependency and confirmed the regression as developed good second opinion for deciding which variables could be set aside with little model 119 effect on the CONCLUSION VII. OVERALL FINDINGS A. There proposition success that promotion rate , scores test statistical strong is is related the in Army, academic the most important measure, OAFQT is not nearly independently affect as its in substantial changes represents very AFQT score The and the time of entry is the more changes but by future promotion rate. for a education at year of measured education at time of entry are year of important indicators The highest as background. explanatory variables of the 1980 normed individual's highest support the to individual's intelligence to the previous and evidence discrete scale in academic background. important as HIYRED and can the predicted promotion rate only up to ten percent. individual scores While in service, how well the Performance Qualification The statistical evidence argued by promotion explanatory showing the measures for a faster promotion rate. these observations can be existence of significantly increasing averages rate and his attendance at Test Scores NCO schooling will be indicative of on his in across ANOVA and ascending levels ANCOVA analysis. of This argument can be supplemented, and those differences seen more concretely, by a simpler comparison of top performers verses 120 . the sample averages Considerable variance of promotion rate exists across any of the levels discrete explanatory variables, and the of within any of the categorical variables. in designing effective an dependent controlling categorical variables such the effects significant. There is a dilemma While variable. as CMF and Paygrade, other variables become more apparent and of the ability of However, the the model to explain variance is significantly diminished. Selecting explanatory set a variables important most the of achieved was via methods. two successive, increasing dimension procedure distilled explanatory unique variables. developing detailed process familiarity with hypothesis insignificant variable from testing contributors and used principal components, was confirmed method a set of relied to A upon In the eliminate identify the most important group of related variables. a a each variable. was explanatory variables set of method This unique and which uses This restricted with the use of a mathematical approach to identify orthogonal and unique variables. When met inferential using assumptions, regression nonparametrically procedures . Further, the resulting model parametrically both model the estimates and are reproducable with an alternate data set. Although the model is technically acceptable, accurate in predicting promotion 121 values for it is only population The low R2 value subcategories. found terms and high regression during were manifested predictions making mean square error based in model incremental testing. When changes in AFQT the sample data values were close, but upper and lower bounds were large so on resulting predictions that were not usefull. The performance poor predictive the of attributed to two possible reasons. some unspecified better account significant First, that there exists predictor variable variance. for inexplicable model can be which could chance there exists secondly, Or occurance the in be used to of a promotion rate for any given individual. In the case of that number the the first available of entries were felt variable. expressing the variables entries held on a is limited. given Of the and forty data fields, this study considered all which explanatory should be observed it or MILPERCEN individual at either DMDC one hundred reason, considered variables was reduced to and predictors. To only six. measures unique discover additional merit Of number of Overall, available as an versions several quality. final the potential included This fundamental same significant have to the twelve significant there are few to use as explanatory variables would require establishment of new personnel data elements in those data bases. report averages, candidates Pot ntial or p' sibly, 122 the results include evaluation of a personality composite test. Alternatively, the quality of information on performance academic inclusion grade of periods. could increased, be averages from such as the school attendance high The utility of this additional data would then have to be evaluated in a manner similar to this thesis. The second reason given explanation, for and not resolution of deterministic even a more probable a physical phenomenon. Although mathematical remedy, this condition with the does not the judgement of whether or not small, highly variable measure of trend still lies The cause effect relationship is more subtle and a more difficult to verify. have a is the subject matter of this study is people, more a error for is sufficient analyst and his ability to present that judgement to decision makers. B. POLICY RECOMMENDATIONS The first question that must be answered in is whether or not having a predictive model is necessary to make policy decisions regarding promotion or answer offered and accession. in this document is that it is not. sufficiently reliable information testing this section subpopulation resulting analysis to The There is from hypothesis make cogent observations and decisions with. From the results of this investigation, makers should HIYRED. than a accession policy closely manage the two attributes of OAFQT and This recommendation proposal. is more a confirmation, rather The 1984 Defense Authorization Act already 123 . . category and places constraints on AFQT high school diploma status in-service attributes that should be managed are The two the Performance Qualification Score, form of promotion points or NCO's of less potential aggressiveness and more competent lessening of minimum threshold a scale would Unfortunately, this may artificially force approach. with the at NCO To directly tie scores on these attributes in the schooling. be one attendance and individuals. discriminatory the into categories The result effectiveness of may be a the two measures If score the and promotion individual policy, directly should not mean that promotion either of variables to not tying these values or thresholds points measure independent these of However, better. be to ability the to achieve his or her education in-service pursue discriminate would scores allowed were would be unused. A policy where promotion boards were still instructed to review an individual's scores, inclusive with notification of this review policy to the NCO population allows for self selection by the more ambitious individuals. C. SUGGESTIONS FOR FURTHER RESEARCH One disturbing observation of this study was the apparent disparity among promotion race and rates. explanation of As ethnic groups in terms of AFQT and pointed out by Daula (1985) the this disparity cannot be seen in an aggregate 124 promotion approach approach, data with time.CRef. 11 pp : is a result subcategory set a . 7-9] of group of duration a individual soldiers model over His paper reports that this disparity Specifically, the shifting of attrition. promotion retention patterns rather, but averages is a result of different among race and ethnic groups, and not due to a racialy sensitive promotion system. A for study to determine the magnitude and underlying reasons the different retention patterns, hypothesis, would have considerable merit. 125 and to test this APPENDIX A CAREER MANAGEMENT FIELDS AND FREQUENCIES MOSNAME Infantry Cbt Engineer Artillery Air Defense Special Ops Armor Hawk Missile Nike Missile Tac Radar Tac Radar Communication Elect Warfare Tech Drafter Chem Warfare Explosive Ord Repair Cargo Spec A/C Repair Admin Spec Programmer Supply Recruiter Topo Eng AV Spec Medical Lab Spec Air Traffic Food SVC Mil Police Intelligence Musician EW/SIGINT CMF 11 12 13 16 18 19 23 27 28 29 31 33 51 54 55 63 64 67 71 74 76 79 81 84 91 92 93 94 95 96 97 98 FREQUENCY 4320 1030 2780 851 244 2434 187 352 40 625 3265 PERCENT 11.4 2.7 7.3 2.2 0.6 6.4 0.5 0.9 0.1 1 .7 8.6 0.1 1.6 1.4 30 619 529 400 3766 1041 1090 3020 423 2677 106 65 157 2498 444 175 919 1674 789 176 1125 1.1 9.9 2.8 2.9 8.0 1.1 7.1 0.3 0.2 0.4 6.6 1.2 0.5 2.4 4.4 2.1 0.5 3.0 126 CUMULATIVE CUMULATIVE FREQUENCY PERCENT 4320 5350 8130 8981 9225 11659 11846 12198 12238 12863 16128 16158 16777 17306 17706 21472 22513 23603 26623 27046 29723 29829 29894 30051 32549 32993 33168 34087 35761 36550 36726 37851 11 .4 14.1 21.5 23.7 24.4 30.8 31 .3 32.2 32.3 34.0 42.6 42.7 44.3 45.7 46.8 56.7 59.5 62.4 70.3 71.4 78.5 78.8 79.0 79.4 86.0 87.2 87.6 90.0 94.5 96.6 97.0 100.0 APPENDIX B AFQT TRANSFORMATION EQUIVALENT SCORES Armed Forces Qualification Test (AFQT) Equivalent Percentile Scores for 1944 Mobilization Population and 1980 Youth Population 1944 1980 1 1 2 3 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 2 2 3 4 5 6 6 8 8 10 11 12 14 15 16 17 18 19 21 22 23 24 25 26 26 27 28 29 30 31 32 1944 1980 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 33 34 35 35 36 37 38 38 39 40 41 42 42 43 44 46 47 48 49 49 50 51 52 53 54 56 57 58 59 60 62 63 65 127 1944 980 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 83 84 85 87 89 91 92 93 94 95 95 97 98 98 99 99 99 D . LIST OF REFERENCES Theory Torgerson, W.S., 1. Wiley & Sons, Inc., 1958. and Methods of Scaling , John Douglas, B.A., An Analysis of the Academic Composites of Apritude Battery (ASVAB) and the Armed Services Vocational Sections of the Preliminary Scholastic the Math and Verbal Aptitude Test (PSAT), the Scholastic Aptitude Test(SAT), and (ACT): the American College Test A Correlation Study PH Southern Illinois University at Carbondale, Dissertation, 1986. 2. , . Jenson A R Test Reviews, ASVAB, in Measurement and Evaluation in Counseling and Development University of California, Berkely, April 1985. 3 . . , , SAS Institute Inc., SAS User's Guide: SAS Institute Inc., 1985. 4. Edition 5. 5 Basics Version 5 , SAS Institute Inc., SAS User's Guide; Edition SAS Institute Inc., 1985. Statistics Version , Research June 1986. 6. IBM Manual , Yorktown Heights,, GRAFSTAT Introductory Schatzoff, M., and others. Regression Analysis in Yorktown GRAFSTAT IBM Research Center, June 1986. 7. , Scribner, B.L., and others. Are Smart Tankers Better? Armed Forces & AFQT and Military Productivity pp. 193-206, Winter 1986. Society, Vol 12 No 2., 8. , 9. Dunbar, S.B., and others. Training for Males and Females; Specialties and ASVAB Forms 6 and 7 University of Iowa, February 1985. On Predicting Success in , Marine Corps Clerical Cada Research Group, The 10. Department of Defense, Office of the Assistant Secretary (Manpower, Installations, and Logistics)., Report of Defense Services Defense to the House and Senate Committees on Armed Manpower Quality, Volume II Army Submission May 1985. , 11. Estimating Time to Promotion for Promotion Daula, T. Paper presented at The Information for Enlisted Soldiers Management/Operations Research Society of America Conference, Boston, MA., July 1985. , Chambers, J.M., and others.. Graphical Methods Analysis Wadsworth, 1983. 12. , 128 for Data Conover W.J., Practical Nonparametric Statistics 13. Wiley & Sons, Inc., 1971 , John Baldwin R.H., Documenting Personnel Quality Requirements 14. United States Military Academy, with Statistical Analysis December 1985. , and 15. Berenson, M.L. others. Intermediate Statistical Methods and Application, A Computer Package Approach Prentice-Hall, Inc., 1983. , 129 INITIAL DISTRIBUTION LIST No. Copies 1. Defense Technical Information Center Cameron Station Alexandria, Virginia 22304-6145 2 2. Library, Code 0142 Naval Postgraduate School Monterey, California 93943-5002 2 3. Superintendent, Code 55Lw Attn: Prof. P.A.W. Lewis Naval Postgraduate School Monterey, California 93943-5000 1 4. Superintendent, Code 55La Attn: Prof. Larson Naval Postgraduate School Monterey, California 93943-5000 1 5. HQDA, ODCSPER DAPC-ZXP (Ltc. Helmick) ATTN: The Pentagon 30310-0300 Washington, D.C. 2 6. Cpt Jim Lewis SMC 2096 Naval Postgraduate School Monterey, California 93943-5000 1 7. Cpt. Jerry B. Warner 6724 Danforth St. McLean, Virginia 22101 1 . 130 MOTORS |W22c.j _ ^^i^n^j. * Thesis W229667 c.l ° 5 u Warner Analysis of intelligence and academic scores as a predictor o£ promotion rate for U.S. Army Noncommissioned Officers. ;^/^?^^^E '"'"HOOL

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement