Version 12 Consumer Research “The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.” Marcel Proust JMP, A Business Unit of SAS SAS Campus Drive Cary, NC 27513 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. JMP® 12 Consumer Research. Cary, NC: SAS Institute Inc. JMP® 12 Consumer Research Copyright © 2015, SAS Institute Inc., Cary, NC, USA ISBN 978-1-62959-438-5 (Hardcopy) ISBN 978-1-62959-440-8 (EPUB) ISBN 978-1-62959-441-5 (MOBI) ISBN 978-1-62959-439-2 (PDF) All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a) and DFAR 227.7202-4 and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only those set forth in this Agreement. SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414. March 2015 SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Technology License Notices • Scintilla - Copyright © 1998-2014 by Neil Hodgson <[email protected]>. All Rights Reserved. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. NEIL HODGSON DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NEIL HODGSON BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. • Telerik RadControls: Copyright © 2002-2012, Telerik. Usage of the included Telerik RadControls outside of JMP is not permitted. • ZLIB Compression Library - Copyright © 1995-2005, Jean-Loup Gailly and Mark Adler. • Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com. • Packages - Copyright © 2009-2010, Stéphane Sudre (s.sudre.free.fr). All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. Neither the name of the WhiteBox nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. • iODBC software - Copyright © 1995-2006, OpenLink Software Inc and Ke Jin (www.iodbc.org). All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: ‒ Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. ‒ Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. ‒ Neither the name of OpenLink Software Inc. nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL OPENLINK OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. • bzip2, the associated library “libbzip2”, and all documentation, are Copyright © 1996-2010, Julian R Seward. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE AUTHOR “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. • R software is Copyright © 1999-2012, R Foundation for Statistical Computing. • MATLAB software is Copyright © 1984-2012, The MathWorks, Inc. Protected by U.S. and international patents. See www.mathworks.com/patents. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/ trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders. • libopc is Copyright © 2011, Florian Reuter. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: ‒ Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. ‒ Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and / or other materials provided with the distribution. ‒ Neither the name of Florian Reuter nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. • libxml2 - Except where otherwise noted in the source code (e.g. the files hash.c, list.c and the trio files, which are covered by a similar licence but with different Copyright notices) all the files are: Copyright © 1998 - 2003 Daniel Veillard. All Rights Reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.IN NO EVENT SHALL THE DANIEL VEILLARD BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Except as contained in this notice, the name of Daniel Veillard shall not be used in advertising or otherwise to promote the sale, use or other dealings in this Software without prior written authorization from him. Get the Most from JMP® Whether you are a first-time or a long-time user, there is always something to learn about JMP. Visit JMP.com to find the following: • live and recorded webcasts about how to get started with JMP • video demos and webcasts of new features and advanced techniques • details on registering for JMP training • schedules for seminars being held in your area • success stories showing how others use JMP • a blog with tips, tricks, and stories from JMP staff • a forum to discuss JMP with other users http://www.jmp.com/getstarted/ Contents Consumer Research 1 Learn about JMP Documentation and Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Formatting Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 JMP Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 JMP Documentation Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 JMP Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Additional Resources for Learning JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Sample Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Learn about Statistical and JSL Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Learn JMP Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Tooltips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 JMP User Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 JMPer Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 JMP Books by Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 The JMP Starter Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2 Introduction to Consumer Research Overview of Customer and Behavioral Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Categorical Response Analysis Analyzing Survey and Other Counting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Categorical Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Example of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Launch the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Response Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Cast Selected Columns into Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Other Launch Window Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 The Categorical Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 10 Consumer Research Share Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Frequency Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Categorical Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Statistical Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Free Text Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Structured Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Additional Examples of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Multiple Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Response Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Indicator Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Multiple Delimited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Multiple Response by ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Mean Score Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4 Factor Analysis Identify Factors within Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Factor Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Example of the Factor Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Launch the Factor Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 The Factor Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Model Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Rotation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Factor Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Factor Analysis Model Fit Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5 Choice Models Fit Models for Choice Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Choice Modeling Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Example of the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Launch the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Choice Model Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Choice Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Valuing Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Example of Valuing Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Consumer Research 11 One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Example of One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Example of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Special Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Default Choice Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Subject Data with Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Example of Transforming Data to Two Analysis Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Example of Transforming Data to One Analysis Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Additional Examples of Logistic Regression Using the Choice Platform . . . . . . . . . . . . . . . . 129 Example of Logistic Regression Using the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . 129 Example of Logistic Regression for Matched Case-Control Studies . . . . . . . . . . . . . . . . . 132 Statistical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 6 Uplift Models Model the Incremental Impact of Actions on Consumer Behavior . . . . . . . . . . . . . . . . . 135 Uplift Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Example of the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Launch the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 The Uplift Model Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Uplift Model Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Uplift Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7 Item Analysis Analyze Test Results by Item and Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Item Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Launch the Item Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 The Item Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Characteristic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Information Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Dual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Item Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 12 8 Consumer Research Multiple Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Example of Multiple Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Launch the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 The Multiple Correspondence Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Multiple Correspondence Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Correspondence Analysis Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Show Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Show Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Show Adjusted Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Show Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Show Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 Show Partial Contributions to Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Show Squared Cosines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Cross Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Cross Table of Supplementary Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Cross Table of Supplementary Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Additional Examples of the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . 167 Example Using a Supplementary Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Example Using a Supplementary ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Statistical Details for the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . . . 171 Statistical Details for the Details Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Statistical Details for Adjusted Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Statistical Details for Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Statistical Details for Partial Contributions to Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 A References Index Consumer Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Chapter 1 Learn about JMP Documentation and Additional Resources This chapter includes the following information: • book conventions • JMP documentation • JMP Help • additional resources, such as the following: ‒ other JMP documentation ‒ tutorials ‒ indexes ‒ Web resources Figure 1.1 The JMP Help Home Window on Windows Contents Formatting Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 JMP Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 JMP Documentation Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 JMP Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Additional Resources for Learning JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Sample Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Learn about Statistical and JSL Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Learn JMP Tips and Tricks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Tooltips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 JMP User Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 JMPer Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 JMP Books by Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 The JMP Starter Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Chapter 1 Consumer Research Learn about JMP Formatting Conventions 15 Formatting Conventions The following conventions help you relate written material to information that you see on your screen. • Sample data table names, column names, pathnames, filenames, file extensions, and folders appear in Helvetica font. • Code appears in Lucida Sans Typewriter font. • Code output appears in Lucida Sans Typewriter italic font and is indented farther than the preceding code. • Helvetica bold formatting indicates items that you select to complete a task: ‒ buttons ‒ check boxes ‒ commands ‒ list names that are selectable ‒ menus ‒ options ‒ tab names ‒ text boxes • The following items appear in italics: ‒ words or phrases that are important or have definitions specific to JMP ‒ book titles ‒ variables ‒ script output • Features that are for JMP Pro only are noted with the JMP Pro icon of JMP Pro features, visit http://www.jmp.com/software/pro/. . For an overview Note: Special information and limitations appear within a Note. Tip: Helpful information appears within a Tip. JMP Documentation JMP offers documentation in various formats, from print books and Portable Document Format (PDF) to electronic books (e-books). 16 Learn about JMP JMP Documentation Chapter 1 Consumer Research • Open the PDF versions from the Help > Books menu or from the JMP online Help footers. • All books are also combined into one PDF file, called JMP Documentation Library, for convenient searching. Open the JMP Documentation Library PDF file from the Help > Books menu. • e-books are available at online retailers. Visit http://www.jmp.com/support/downloads/ documentation.shtml for details. • You can also purchase printed documentation on the SAS website: http://support.sas.com/documentation/onlinedoc/jmp/index.html JMP Documentation Library The following table describes the purpose and content of each book in the JMP library. Document Title Document Purpose Document Content Discovering JMP If you are not familiar with JMP, start here. Introduces you to JMP and gets you started creating and analyzing data. Using JMP Learn about JMP data tables and how to perform basic operations. Covers general JMP concepts and features that span across all of JMP, including importing data, modifying columns properties, sorting data, and connecting to SAS. Basic Analysis Perform basic analysis using this document. Describes these Analyze menu platforms: • Distribution • Fit Y by X • Matched Pairs • Tabulate How to approximate sampling distributions using bootstrapping and modeling utilities are also included. Chapter 1 Consumer Research Learn about JMP JMP Documentation Document Title Document Purpose Document Content Essential Graphing Find the ideal graph for your data. Describes these Graph menu platforms: • Graph Builder • Overlay Plot • Scatterplot 3D • Contour Plot • Bubble Plot • Parallel Plot • Cell Plot • Treemap • Scatterplot Matrix • Ternary Plot • Chart The book also covers how to create background and custom maps. Profilers Learn how to use interactive profiling tools, which enable you to view cross-sections of any response surface. Covers all profilers listed in the Graph menu. Analyzing noise factors is included along with running simulations using random inputs. Design of Experiments Guide Learn how to design experiments and determine appropriate sample sizes. Covers all topics in the DOE menu and the Screening menu item in the Analyze > Modeling menu. 17 18 Learn about JMP JMP Documentation Chapter 1 Consumer Research Document Title Document Purpose Document Content Fitting Linear Models Learn about Fit Model platform and many of its personalities. Describes these personalities, all available within the Analyze menu Fit Model platform: Specialized Models Learn about additional modeling techniques. • Standard Least Squares • Stepwise • Generalized Regression • Mixed Model • MANOVA • Loglinear Variance • Nominal Logistic • Ordinal Logistic • Generalized Linear Model Describes these Analyze > Modeling menu platforms: • Partition • Neural • Model Comparison • Nonlinear • Gaussian Process • Time Series • Response Screening The Screening platform in the Analyze > Modeling menu is described in Design of Experiments Guide. Multivariate Methods Read about techniques for analyzing several variables simultaneously. Describes these Analyze > Multivariate Methods menu platforms: • Multivariate • Cluster • Principal Components • Discriminant • Partial Least Squares Chapter 1 Consumer Research Learn about JMP JMP Documentation Document Title Document Purpose Document Content Quality and Process Methods Read about tools for evaluating and improving processes. Describes these Analyze > Quality and Process menu platforms: Reliability and Survival Methods Consumer Research Learn to evaluate and improve reliability in a product or system and analyze survival data for people and products. Learn about methods for studying consumer preferences and using that insight to create better products and services. • Control Chart Builder and individual control charts • Measurement Systems Analysis • Variability / Attribute Gauge Charts • Process Capability • Pareto Plot • Diagram Describes these Analyze > Reliability and Survival menu platforms: • Life Distribution • Fit Life by X • Recurrence Analysis • Degradation and Destructive Degradation • Reliability Forecast • Reliability Growth • Reliability Block Diagram • Survival • Fit Parametric Survival • Fit Proportional Hazards Describes these Analyze > Consumer Research menu platforms: • Categorical • Multiple Correspondence Analysis • Factor Analysis • Choice • Uplift • Item Analysis 19 20 Learn about JMP Additional Resources for Learning JMP Chapter 1 Consumer Research Document Title Document Purpose Document Content Scripting Guide Learn about taking advantage of the powerful JMP Scripting Language (JSL). Covers a variety of topics, such as writing and debugging scripts, manipulating data tables, constructing display boxes, and creating JMP applications. JSL Syntax Reference Read about many JSL functions on functions and their arguments, and messages that you send to objects and display boxes. Includes syntax, examples, and notes for JSL commands. Note: The Books menu also contains two reference cards that can be printed: The Menu Card describes JMP menus, and the Quick Reference describes JMP keyboard shortcuts. JMP Help JMP Help is an abbreviated version of the documentation library that provides targeted information. You can open JMP Help in several ways: • On Windows, press the F1 key to open the Help system window. • Get help on a specific part of a data table or report window. Select the Help tool from the Tools menu and then click anywhere in a data table or report window to see the Help for that area. • Within a JMP window, click the Help button. • Search and view JMP Help on Windows using the Help > Help Contents, Search Help, and Help Index options. On Mac, select Help > JMP Help. • Search the Help at http://jmp.com/support/help/ (English only). Additional Resources for Learning JMP In addition to JMP documentation and JMP Help, you can also learn about JMP using the following resources: • Tutorials (see “Tutorials” on page 21) • Sample data (see “Sample Data Tables” on page 21) • Indexes (see “Learn about Statistical and JSL Terms” on page 21) Chapter 1 Consumer Research Learn about JMP Additional Resources for Learning JMP • Tip of the Day (see “Learn JMP Tips and Tricks” on page 22) • Web resources (see “JMP User Community” on page 22) • JMPer Cable technical publication (see “JMPer Cable” on page 22) • Books about JMP (see “JMP Books by Users” on page 23) • JMP Starter (see “The JMP Starter Window” on page 23) 21 Tutorials You can access JMP tutorials by selecting Help > Tutorials. The first item on the Tutorials menu is Tutorials Directory. This opens a new window with all the tutorials grouped by category. If you are not familiar with JMP, then start with the Beginners Tutorial. It steps you through the JMP interface and explains the basics of using JMP. The rest of the tutorials help you with specific aspects of JMP, such as creating a pie chart, using Graph Builder, and so on. Sample Data Tables All of the examples in the JMP documentation suite use sample data. Select Help > Sample Data Library to do the following actions to open the sample data directory. To view an alphabetized list of sample data tables or view sample data within categories, select Help > Sample Data. Sample data tables are installed in the following directory: On Windows: C:\Program Files\SAS\JMP\<version_number>\Samples\Data On Macintosh: \Library\Application Support\JMP\<version_number>\Samples\Data In JMP Pro, sample data is installed in the JMPPRO (rather than JMP) directory. In JMP Shrinkwrap, sample data is installed in the JMPSW directory. Learn about Statistical and JSL Terms The Help menu contains the following indexes: Statistics Index Provides definitions of statistical terms. Lets you search for information about JSL functions, objects, and display boxes. You can also edit and run sample scripts from the Scripting Index. Scripting Index 22 Learn about JMP Additional Resources for Learning JMP Chapter 1 Consumer Research Learn JMP Tips and Tricks When you first start JMP, you see the Tip of the Day window. This window provides tips for using JMP. To turn off the Tip of the Day, clear the Show tips at startup check box. To view it again, select Help > Tip of the Day. Or, you can turn it off using the Preferences window. See the Using JMP book for details. Tooltips JMP provides descriptive tooltips when you place your cursor over items, such as the following: • Menu or toolbar options • Labels in graphs • Text results in the report window (move your cursor in a circle to reveal) • Files or windows in the Home Window • Code in the Script Editor Tip: You can hide tooltips in the JMP Preferences. Select File > Preferences > General (or JMP > Preferences > General on Macintosh) and then deselect Show menu tips. JMP User Community The JMP User Community provides a range of options to help you learn more about JMP and connect with other JMP users. The learning library of one-page guides, tutorials, and demos is a good place to start. And you can continue your education by registering for a variety of JMP training courses. Other resources include a discussion forum, sample data and script file exchange, webcasts, and social networking groups. To access JMP resources on the website, select Help > JMP User Community. JMPer Cable The JMPer Cable is a yearly technical publication targeted to users of JMP. The JMPer Cable is available on the JMP website: http://www.jmp.com/about/newsletters/jmpercable/ Chapter 1 Consumer Research Learn about JMP Additional Resources for Learning JMP 23 JMP Books by Users Additional books about using JMP that are written by JMP users are available on the JMP website: http://www.jmp.com/support/books.shtml The JMP Starter Window The JMP Starter window is a good place to begin if you are not familiar with JMP or data analysis. Options are categorized and described, and you launch them by clicking a button. The JMP Starter window covers many of the options found in the Analyze, Graph, Tables, and File menus. • To open the JMP Starter window, select View (Window on the Macintosh) > JMP Starter. • To display the JMP Starter automatically when you open JMP on Windows, select File > Preferences > General, and then select JMP Starter from the Initial JMP Window list. On Macintosh, select JMP > Preferences > Initial JMP Starter Window. 24 Learn about JMP Additional Resources for Learning JMP Chapter 1 Consumer Research Chapter 2 Introduction to Consumer Research Overview of Customer and Behavioral Research Methods You already collect information about how customers use a product or service or how satisfied they are with your offerings. The resulting insight lets you create better products and services, happier customers, and more revenue for your organization. JMP now includes a full suite of tools for performing customer and consumer research. In the past, you might have had to use one product for consumer research work and JMP for design of experiments. Now you can do both types of analyses using a single product, for a more efficient use of your most precious resource: your time. Tools for performing these statistical analyses are now located in one convenient place: the Consumer Research menu. Use the following platforms to analyze your data: • The Categorical platform supports survey analysis with questions in multiple formats, allowing for both detailed and compact reporting. You can also analyze multiple response questions, where your survey includes questions for which respondents can choose more than one answer. You can output the results in crosstab report tables, use share and frequency charts, view mean scores across responses, and perform tests and comparisons. And when you are finished, you can easily output the completed analysis tables. For more information, see Chapter 3, “Categorical Response Analysis”. • The Factor Analysis platform enables you to discover simple arrangements in the pattern of relationships among variables. It seeks to discover if the observed variables can be explained in terms of a much smaller number of variables or factors. By using factor analysis, you can determine the number of factors that influence a set of measured, observed variables, and the strength of the relationship between each factor and each variable. For more information, see Chapter 4, “Factor Analysis”. • The Choice platform is designed for use in market research experiments, where the ultimate goal is to discover the preference structure of consumers. Then, this information is used to design products or services that have the attributes most desired by consumers. For more information, see Chapter 5, “Choice Models”. • The Uplift platform enables you to maximize the impact of your marketing budget by sending offers only to individuals who are likely to respond favorably, even when you have large data sets and many possible behavioral or demographic predictors. You can use uplift models to make such predictions. This method has been developed to help optimize marketing decisions, define personalized medicine protocols, or, more generally, to identify characteristics of individuals who are likely to respond to some action. For more information, see Chapter 6, “Uplift Models”. 26 Introduction to Consumer Research Chapter 2 Consumer Research • The Item Analysis platform provides a method of scoring tests. Based on Item Response Theory (IRT), the platform helps analyze the design, analysis, and scoring of tests, questionnaires, and other tools that measure abilities, attitudes, and other variables. Although classical test theory methods have been widely used for a century, IRT provides a better and more scientifically based scoring procedure. For more information, see Chapter 7, “Item Analysis”. • The Multiple Correspondence Analysis (MCA) platform takes multiple categorical variables and seeks to identify associations between levels of those variables. MCA is frequently used in the social sciences particularly in France and Japan. It can be used in survey analysis to identify question agreement. For more information, see Chapter 8, “Multiple Correspondence Analysis”. Chapter 3 Categorical Response Analysis Analyzing Survey and Other Counting Data The Categorical platform tabulates and summarizes categorical response data, including multiple response data, and calculates test statistics. The strength of the Categorical platform is that it can handle responses in a wide variety of formats without needing to reshape the data. It is designed to handle survey and other categorical response data, such as defect records, side effects, and so on. Figure 3.1 Categorical Analysis Example Contents Categorical Platform Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Example of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Launch the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Response Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Cast Selected Columns into Roles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Other Launch Window Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 The Categorical Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Share Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Frequency Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Categorical Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Statistical Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Free Text Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Structured Report Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Additional Examples of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Multiple Response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Response Frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Indicator Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Multiple Delimited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Multiple Response by ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Mean Score Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Overview 29 Categorical Platform Overview The Categorical platform can produce results from a rich variety of organizations of data, as reflected in the tabbed panels that enable you to specify the analyses that you want. The Categorical platform has capabilities similar to other platforms. The choice of platform depends on your focus, the shape of your data, and the desired level of detail. The strength of the Categorical platform is that it can handle responses in a wide variety of formats without needing to reshape the data. Table 3.1 shows several of JMP’s analysis platforms and their strengths. Table 3.1 Comparing JMP’s Categorical Analyses Platform Specialty Distribution Separate, ungrouped categorical responses. Fit Y By X: Contingency Two-way situations, including chi-square tests, correspondence analysis, agreement. Pareto Plot Graphical analysis of multiple-response data, especially multiple-response defect data, with more rate tests than Fit Y By X. Variability Chart: Attribute Attribute gauge studies, with more detail on rater agreement. Fit Model Logistic categorical responses and generalized linear models. Partition, Neural Net Specific categorical response models. Example of the Categorical Platform This example uses the Consumer Preferences.jmp sample data table, which contains survey data on people’s attitudes and opinions, and some questions concerning oral hygiene (source: Rob Reul, Isometric Solutions). 1. Select Help > Sample Data Library and open Consumer Preferences.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select I am working on my career and click Responses on the Simple tab. 4. Select Age Group and click X, Grouping Category. 5. Click OK. 6. Select Crosstab Transposed from the Categorical red triangle menu. 30 Categorical Response Analysis Example of the Categorical Platform Chapter 3 Consumer Research 7. Select Test Response Homogeneity from the Categorical red triangle menu. Figure 3.2 details the responses indicating that a respondent is currently working on his or her career and the age group. Of those responding positively, the highest majority working on their career were in the age group 25-29 at 84.1%. The highest majority of those responding oppositely were in the age group > 54 at 53.5%. Figure 3.2 Survey Results by Age Group Launch the Categorical Platform Launch the Categorical Platform by selecting Analyze > Consumer Research > Categorical. Chapter 3 Consumer Research Categorical Response Analysis Example of the Categorical Platform 31 Figure 3.3 Categorical Platform Launch Window The launch window includes tabs for a variety of response roles (Simple, Related, and Multiple) and a Structured tab where you can create your own structured responses. The following sections describe the different response types and effects. Response Roles Use the response roles buttons within the tabs to choose selected columns as responses with specified roles. You can also drag column names to the response list. The response roles are summarized in Table 3.2. Simple Tab The default tab, Simple, contains a single button, Responses. This is appropriate for all basic analyses that do not have a special structure. You can drag column names from the Select Columns list to the Response list, or you can select columns and then click Responses. If a column has a Multiple Response column property, JMP automatically changes the handling of the column to recognize this property. Related Tab The Related tab contains a set of response columns that all have the same type of categories in them: 32 Categorical Response Analysis Example of the Categorical Platform Chapter 3 Consumer Research Aligned Responses Performs the analysis like the default analysis, but shows the results more compactly by aligning the analyses side-by-side into one larger table. Indicates that the columns reflect responses made by the same individual at different times, and you are interested in the changes between the times. Repeated Measures Rater Agreement Is useful when each column is a rating for the same question, but by different individuals (raters) and you want to study how much the raters agree on their responses. Multiple Tab The Multiple tab is for multiple responses; when a question can involve checking off more than one choice. There are a variety of ways of storing multiple response data, so there are several buttons in the tab to accommodate the various means: Multiple Response Means that you have several columns acting like fill-in-the-blank columns to specify the multiple responses. Multiple Response by ID Indicates that you have several rows in a table corresponding to the multiple responses in one column, and the individuals are identified by an ID column. Multiple Delimited Signifies that you have one column that has several responses in it separated by a comma. Indicator Group Denotes that there is a column for each possible response, and each column is an indicator (for example, it has only two values, like 0 or 1). Also has a column for each possible response, but has frequency counts instead of an indicator. Response Frequencies Is used for comment fields where the analysis counts the frequency of each word used. Free Text gives word counts in both word order and frequency order, and the rate of non-empty text. For more information about Free Text, refer to “Free Text Report Options” on page 56. Free Text Structured Tab The Structured tab enables you to construct complex tables of descriptive statistics by dragging column names into green icon drop zones to create side-by-side and nested results. You can nest a variable within or beside another variable according to the structure that you want for the top and side of the table. Continue to drag columns, either beside or nested within another column, to specify the structure. For more information about structured reports, refer to “Structured Report Options” on page 57. Chapter 3 Consumer Research Categorical Response Analysis Example of the Categorical Platform 33 Figure 3.4 Structured Tab To create a structured table: 1. Drag a column name to the green drop zone at the Top or Side of the table. The drop zone is highlighted in pink. 2. Add more variables by dragging a column name the appropriate green drop zone. You can also drag a column name between two columns. To remove a selection, click the selection and then select Undo. To add a selection back that you just removed, select Redo. To clear all of the selections, select Clear. 3. When you are finished creating the table, click Add=> to add the variables to the response list. To make a revision once you have added your selection to the response list, click the selection and then select <=Edit. 4. Complete the remainder of the launch window as necessary and click OK. The Categorical report window appears. 5. Should you want to make a change to the table, select Relaunch Dialog from the Categorical red triangle menu. The launch window reappears where you can make edits to your selections. A few guidelines with Structured effects: • Structured always assumes that the innermost terms on the Side are responses, and that all other terms are sample level grouping factors. • You can analyze multiple response terms in the form of delimited multiple response columns, but you must indicate that it is a multiple response by having a Multiple Response column property. Use the Col Info window to add this property, as needed. 34 Categorical Response Analysis Example of the Categorical Platform Chapter 3 Consumer Research Table 3.2 Response Roles Response Role Description Simple Tab Responses Separate responses are in each column, resulting in a separate analysis for each column. Related Tab Aligned Responses Responses share common categories across columns, resulting in better-organized reports. Repeated Measures Aligned responses from an individual across different times or situations. Rater Agreement Aligned responses from different raters evaluating the same unit, to study agreement across raters. Multiple Tab Multiple Response Aligned responses, where multiple responses are entered across several columns, but treated as one grouped response. Multiple Response by ID Multiple responses across rows that have the same ID values. Multiple Delimited Several responses in a single cell, separated by commas. Indicator Group Binary responses across columns, like selected or deselected, yes or no, but all in a related group. Example Data Chapter 3 Consumer Research Categorical Response Analysis Example of the Categorical Platform 35 Table 3.2 Response Roles (Continued) Response Role Description Response Frequencies Columns containing frequency counts for each response level, all in a related group. Free Text Counts the frequency of each word used in a comment field. Example Data Structured Tab Drag variables to green drop zones to create your own structured table. Cast Selected Columns into Roles The lower right panel of the Launch window has the following options: X, Grouping Category Defines sample level groups to break the counts into. By default, it tabulates each combination of X values, but uses the Grouping Option below for other combinations. Sample Size Defines the number of individual units in the group for which that frequency is applicable to, for multiple response roles with summarized data. For example, a Freq column might indicate 50 defects, where the sample size variable would reflect the defects for a batch of 100 units. Specifies the column containing frequency counts for each row for presummarized data. Freq ID Only required and used when Multiple Response by ID is selected. By Identifies a variable to produce a separate analysis for each value that appears in the column. Other Launch Window Options Several launch options are presented in the lower left panel of the window that can be specified before the analysis. The options can also be selected later from the Categorical red triangle menu, and have the effect of rerunning the platform with the new setting. The default settings for some of the launch options can be changed in the Categorical red triangle menu. For more information, refer to “Set Preferences” on page 55. 36 Categorical Response Analysis Example of the Categorical Platform Chapter 3 Consumer Research Grouping Option Specifies whether you want to use the X columns individually or in a combination. Use this option only to specify more than one X (Grouping) column, and denote whether you want to treat the Xs one at a time, or in a fully nested grouping, or both. For example, if the X Columns are Region and Age Group, you can get separate tables for Response by Region and Response by Age Group (each individually) or get a nested table with each age group within each region (combinations), or both. Combinations gives frequency results for combinations of the X variables. Each Individually gives frequency results for each X variable individually. Both gives frequency results for combinations of the X variables, and individually. Allows duplicate response levels within a subject to be counted only once. An ID variable must be specified. Unique Occurrences within ID The following options can be specified on the launch window as well as from the Categorical report red triangle menu. They are also available as Preference settings. For more information, refer to “Statistical Options” on page 41 and “Set Preferences” on page 55. Changes the behavior to tabulate missing values as categories, while still excluding them from statistical comparisons. When you have missing values, this specifies whether you want to see them tabulated beside the nonmissing data, or just excluded. Missing values can be either standard (numeric NAN or character empty) or a code declared as missing with the column property Missing Value Codes. Note that if a column contains only missing values, the missing values are counted regardless of the state of this option. Count Missing Responses Changes the response order but keeps the X order from low to high. The default ordering is low to high. You can control the ordering with a column property (Value Ordering), but if you always want to see the high values first, then select this option. Often, ordered categories are ratings, and you want to see the positive ratings first. In the red triangle menu, this option is under Category Options. Order Response Levels High to Low Shorten Labels Shortens labels by removing common prefixes and suffixes. Sometimes surveys code a lengthy label that contain a common prefix or suffix. For example, “Occurred 5 to 10 times in the last year” might be a level, but the phrase “in the last year” is repeated for each value label, and you do not need to see it repeated in the report. This option trims the common prefixes and suffixes. It also changes multiple blanks into single blanks. The option only applies to value labels, not column names. Include Responses Not in Data Includes a count for values that were not in the data. Sometimes when you conduct a survey and give choices, one of the choices is not selected. If you still want to see the choices that are not in the data, use this option. It determines the missing categories from the value labels in the column. Chapter 3 Consumer Research Categorical Response Analysis The Categorical Report 37 Supercategories When ratings are involved in a data set (for example, a five point scale), you might want to know the percent of the responses in the top two or other subset of ratings. Such a group of ratings can be defined in the data through the column property, Supercategories. The term Supercategories refers to the extra slots in a table to aggregate over groups of categories. The Supercategories property supports four keywords: Group, Mean, Std Dev, and All. Mean and Std Dev calculate statistics for value scores, and All aggregates across all levels. For example, a “Top Two” Group supercategory could aggregate the two top categories in a response, as specified. Although we support Mean, Std Dev, and All, we do not recommend using them because they are available as built-in statistics as well as supercategories. Create Supercategories by selecting a column and then selecting Column Info > Column Properties > Supercatagories. On the column properties window, select a column, enter a Supercategory Name, and click Add. The Supercategories Options red triangle menu item provides the following commands for a selected category: Hide Omits columns for each category from the report. Only a column for the supercategory is included. Net Prevents individuals from being counted twice when they appear in more than one supercategory. Net is available only for a multiple response column. The following red triangle options are also provided: Add Mean Includes mean statistics in the report. Add Std Dev Includes standard deviation statistics in the report. Includes total responses in the report. By default, the Total Responses column is always included. Add All Note: Supercategories are supported for all response effects except Repeated Measures and Rater Agreement. Some response effects do not support Mean and Std Dev slots, because the effects do not have a natural score. The Categorical Report The Categorical platform produces a report with several tables and bar charts depending on your selections. You might or might not see all of the following options depending on which response type and options you selected. Frequencies, Share of Responses, and Rate Per Case appear in a single table by default. A Share Chart also appears by default (unless you used the Structured tab in the launch window). You can choose to view a Frequency Chart or Transposed Frequency Chart. 38 Categorical Response Analysis The Categorical Report Chapter 3 Consumer Research You can view or hide each option (Frequencies, Share of Responses, Rate Per Case, Share Chart, Frequency Chart, or Transposed Freq Chart) from the Categorical red triangle menu. Data from Consumer Preferences.jmp is displayed in Figure 3.5. Figure 3.5 The Categorical Report The topmost item in the table is a Frequency count (Freq), showing the frequency counts for each category with the total frequency (Total Responses) and total units (Total Cases) at the bottom of the table. Chapter 3 Consumer Research Categorical Response Analysis The Categorical Report 39 In this example, the number of responses and cases for each age group by the 7 segments are displayed. The Share of Responses (Share) is determined by dividing each count by the total number of responses. The number represents the percent of the response among all the responses in the sample (frequency divided by response total). This is either a column percentage or row percentage depending on whether your table has the responses on top or down the side (transposed). For example, examine the second row of the table for Floss After Waking Up. The 37 responses who floss when they wake up were 25.9% of all responses (37/143*100). The Rate Per Case (Rate) divides each count in the frequency table by the total number of cases. If you have multiple responses per case (subject), there are two types of percentages; the rate per case is frequency as a percent of total cases, whereas the share of responses is the frequency as a percent of the total responses. Rate is available only for multiple responses. For example, in the third row of the table (Floss After Waking Up), the 37 respondents are from 113 cases, making the rate per respondent 32.7%. Share Chart The Share Chart presents a divided bar chart. The bar length is proportional to the percentage of responses for each type. The column on the right shows the number of responses. Figure 3.6 Share Chart Frequency Chart The Frequency Chart shows response frequencies. The bars reflect the frequency count on the same scale and the number of responses are displayed to the right. To view the frequency chart, select Frequency Chart from the Categorical red triangle menu. 40 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Figure 3.7 Frequency Chart The Transposed Freq Chart option produces a transposed version of the Frequency Chart. Marginal totals are given for each response, as opposed to each X variable. Categorical Platform Options The Categorical red triangle menu provides commands that customize the appearance of the report and provide the means to test and compare your results. The following options appear in the menu depending on response roles and options selected. You might or might not view all of the options depending on your selections. Report Options The default report format is the Crosstab format, which gathers all three statistics for each sample level and response together. The Crosstab format displays the responses on the top and the sample levels down the side, with multiple table elements together in each cell of the cross tabulation. Figure 3.8 Crosstab Format Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 41 The Crosstab format has a transposed version, Crosstab Transposed, which is useful when there are a lot of response categories but not a lot of sample levels. Crosstab Transposed displays the responses down the side and the sample levels across the top, with multiple table elements together in each cell. The Structured analysis always uses the Crosstab Transposed form, but in a more complex arrangement. The Free Text analysis has its own specialized reports. For more information about Free Text, refer to “Free Text Report Options” on page 56. Selecting the Legend red triangle menu shows or hides the legend for the response column on the Share Chart. Statistical Options The main question of interest in any table is whether shares or rates vary from each group of sample levels, and specifically which groups are significantly different. There are two families of tests and comparisons, which correspond to single category responses and multiple responses: • With single responses, a given response is just one response category, and the question is whether the share of responses is different across sample levels. Single responses are tested with a chi-square test of homogeneity. However, there are two types of this test: the Likelihood Ratio Chi-square and the Pearson Chi-square. It is a matter of personal preference and training which one you prefer. An option, Chi-square Test Choices, on the Categorical red triangle menu, enables you to show one or the other, or both. For more information, refer to “Test Options” on page 54. • With multiple responses, each individual can select several categories, and the question is whether the rate is different across sample levels. For multiple responses, each response is treated in a separate account, with a probability of the count for each subject as a Poisson distribution (allowing for multiples of the same category). Each response is tested to determine whether the parameters are the same across sample levels. The following options appear in the Categorical red triangle menu depending on context: 42 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Table 3.3 Categorical Platform Commands Command Supported Response Contexts Question Details Test Response Homogeneity • Responses • Aligned Responses • Repeated Measures Are the probabilities across the response categories the same across sample levels? • Response Frequencies with Sample Size • Structured Marginal Homogeneity (Independence) Test, both Pearson and Chi-square likelihood ratio chi-square. For more information, refer to “Test Response Homogeneity” on page 45. Test Multiple Response • Multiple Response (requires multiple response data) • Multiple Response by ID (with Sample Size) For each response category, are the rates the same across sample levels? • Multiple Delimited Poisson regression or binomial test on sample for each defect frequency. For more information, refer to “Test Multiple Response” on page 46. • Response Frequencies with Sample Size • Structured How closely do raters agree, and is the lack of agreement symmetrical? Kappa for agreement, Bowker and McNemar for symmetry. For more information, refer to “Rater Agreement” on page 50. (requires multiple response data) Agreement Statistic Rater Agreement Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 43 Table 3.3 Categorical Platform Commands (Continued) Command Supported Response Contexts Question Details Transition Report Repeated Measures How have the categories changed across time? Transition counts and rates matrices. For more information, refer to “Repeated Measures” on page 51. Cell Chisq Responses How do I further analyze the results to obtain more information? For more information, refer to “Cell Chisq” on page 48. Compare Each Sample • Responses • Aligned Responses Do levels of the response category differ significantly? • Repeated Measures For more information, refer to “Compare Each Sample” on page 52. • Response Frequencies (if no Sample Size) • Structured • Single and Multiple Responses • Structured Do pairs of levels within the two response categories differ significantly? For more information, refer to “Compare Each Cell” on page 53. • ChiSquare Test Choices • Show Warnings • Order by Significance How do I further analyze the results to obtain more information? For more information, refer to “Test Options” on page 54. • Hide Nonsignificant Compare Each Cell Test Options There are a series of options that add more detail for each group of sample levels. The options that appear depend on your selections and the details of your analysis: 44 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Total Responses Shows the sum of the frequency counts for each group of sample levels. Total Cases For multiple response columns, shows the number of cases (subjects), which are different from the number of responses. For multiple response columns used in Structured tables, counts each person who responded at least once. People who did not respond at all are not included. Total Cases Responding Mean Score Calculates the response means, using the numeric categories, or value scores. This is enabled for columns that use numeric codes, or for categories that have a Value Scores property. To make the Mean Score interpretable, you can assign specific value scores in the Column Info window with the Value Scores column property.For more information and an example, refer to “Mean Score Example” on page 64. Mean Score Comparison Compares the mean scores across groups of sample levels, showing which groups are significantly different. A pairwise multiple comparisons Student’s t test is used for the mean score comparison, based on the specified comparison groups. For more information about the letter codes, refer to “Comparisons with Letters” on page 53. For more information about specifying the comparison groups, refer to “Specify Comparison Groups” on page 58. Std Dev Score Calculates the standard deviation of the value scores. Orders the mean score calculations. The option only appears when there are no X columns in the analysis. Order by Mean Score Save Tables Saves the report to a new data table. For more information, refer to “Save Tables” on page 54. Filter Filters data to specific groups or ranges. Opens the Local Data Filter panel allowing you to identify varying subsets of data. The filtered rows do not appear in the reports. Sample levels with 0 values are always hidden. To show the filtered rows in reports, select Include Responses Not in Data in the launch window. You can also select the Set Preferences red triangle menu, and then select Include Responses Not in Data. For more information, refer to Using JMP. Contents Summary Collects all of the tests and mean scores into a summary at the top of the report with links to the associated item. Enables you to specify formats for Frequencies, Shares and Rates, and how zeros are displayed. By default, Frequencies are Fixed Dec with 7 Width and 0 Decimals and Shares and Rates are Percent with 6 Width and 1 Decimal. Format Elements Arrange in Rows Arranges the reports across the page as opposed to down. Enter the number of reports that you want to view across the window. Set Preferences Enables you to set preferences for future launches and sessions. For more information, refer to “Set Preferences” on page 55. Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 45 Category Options Contains options (Grouping Option, Count Missing Response, Order Response Levels High to Low, Shorten Labels, and Include Responses Not in Data) that are also presented on the launch window that could be specified before the analysis. The options can also be selected here and have the effect of rerunning the platform with the new option setting. For more information, refer to “Other Launch Window Options” on page 35. Force Crosstab Shading Forces shading on crosstab reports even if the preference is set to no shading. Enables you to return to the launch window and edit the specifications for a structured table. For more information, refer to “Structured Tab” on page 32. Relaunch Dialog Script Contains options that are available to all platforms. See Using JMP. Test Response Homogeneity Test Response Homogeneity is the standard chi-square test (for single responses) across all sample levels. There is typically one categorical response variable and one categorical sample variable. Multiple sample variables are treated as a single variable. The test is the chi-square test for marginal homogeneity of response patterns, testing that the response probabilities are the same across samples. This is equivalent to a test for independence when the sample level is like a response. There are two versions of this test, the Pearson form and the Likelihood Ratio form, both with chi-square statistics. The Test Options menu (ChiSquare Test Choices) is used to show or hide the Likelihood Ratio or Pearson tests. If Show Warnings is turned on, the report displays if the frequencies are too low to make good tests. As an example: 1. Select Help > Sample Data Library and open Car Poll.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select country and click Responses on the Simple tab. 4. Select marital status and click X, Grouping Category. 5. Click OK. 6. Select Test Response Homogeneity from the Categorical red triangle menu. 46 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Figure 3.9 Test Response Homogeneity The Share Chart indicates that the married group is more likely to buy American cars, and the single group is more likely to buy Japanese cars, but the statistical test only shows a significance of 0.08. Therefore, the difference in response probabilities across marital status is not statistically significant at an alpha level of 0.05. Test Multiple Response Test Multiple Response is the standard chi-square test for multiple responses, with one test statistic for each response category. When there are multiple responses, each response category can be modeled separately. The question is whether the response rates are the same across samples. The Test Multiple Response red triangle menu provides the following options: For each response category, the frequency count has a random Poisson distribution. The rate test is obtained using a Poisson regression (through generalized linear models) of the frequency per unit modeled by the sample categorical variable. The result is a likelihood ratio chi-square test of whether the rates are different across sample levels. This test can also be done by the Pareto platform, as well as in the Generalized Linear Model personality of the Fit Model platform. Count Test, Poisson Homogeneity Test, Binomial For each response category, the frequency count has a random binomial distribution. Select this test when the response can be true only once for each respondent (for example, in a “check all that apply” questionnaire). To test multiple responses, follow these steps: 1. Select Help > Sample Data Library and open Consumer Preferences.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select Brush Delimited and click Multiple Delimited on the Multiple tab. Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 47 4. Select brush and click X, Grouping Category to compare the sample levels across the brush treatment variable. 5. Click OK. 6. Select Test Multiple Response and then Count Test, Poisson from the Categorical red triangle menu. Figure 3.10 Test Multiple Response, Poisson The p-values show that After Meal and Before Sleep are the most significantly different. Wake is not significantly different with this amount of data. 7. Select Test Multiple Response and then Homogeneity Test, Binomial from the Categorical red triangle menu. 48 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Figure 3.11 Test Multiple Response, Binomial The Homogeneity Test, Binomial option always produces a larger test statistic (and therefore a smaller p-value) than the Count Test, Poisson option. The binomial distribution compares not only the rate at which the response occurred (the number of people who reported that they brush upon waking) but also the rate at which the response did not occur (the number of people who did not report that they brush upon waking). Cell Chisq For single responses, Cell Chisq displays the cell-by-cell composition of the Pearson chi-square overall, and also shows which cells have relatively more (red) or less (blue) than expected if they were the same across sample levels. The value shown is the p-value for the chi-square. The color is bright when they are significant, and grayer when less significant, denoting visually where the significant differences are. As an example: 1. Select Help > Sample Data Library and open Consumer Preferences.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select I am working on my career and click Responses on the Simple tab. 4. Select Age Group and click X, Grouping Category. 5. Click OK. 6. Select Crosstab Transposed from the red triangle menu. 7. Select Cell Chisq from the red triangle menu. Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 49 Figure 3.12 Cell Chisq Relative Risk The Relative Risk option is used to compute relative risks for different responses. The risk of responses is computed for each level of the X, Grouping variable. The risks are compared to get a relative risk. This option is available when the X, Grouping variable has two levels, and one of the following is true: • The response variable has two levels. • The response variable is a Multiple Response and the Unique occurrences within ID box is checked on the Categorical launch window. A common application of this analysis is when the responses represent adverse events (side effects), and the X variable represents a treatment (drug versus placebo). The risk for getting each side effect is computed for both the drug and placebo. The relative risk is the ratio of the two risks. Conditional Association The Conditional Association option is used to compute the conditional probability of one response given a different response. A table and color map of the conditional probabilities are given. This option is available only when the Unique occurrences within ID box is checked on the Categorical launch window. A common application of this analysis is when the responses represent adverse events (side effects) from a drug. The computations represent the conditional probability of one side effect given the presence of another side effect. For AdverseR.jmp, given the response in each row, Figure 3.13 shows the rate of also having the response in a column. Figure 3.13 only displays a few variables in the table due to size constraints. 50 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Figure 3.13 Conditional Association Rater Agreement The Rater Agreement analysis answers the questions of how closely raters agree with one another and if the lack of agreement is symmetrical. For example, open Attribute Gauge.jmp. The Attribute Chart script runs the Variability Chart platform, which has a test for agreement among raters. Figure 3.14 Agreement Comparisons Launch the Categorical platform and designate the three raters (A, B, and C) as Rater Agreement responses on the Related tab on the launch window. In the resulting report, you Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 51 have a similar test for agreement that is augmented by a symmetry test that the lack of agreement is symmetric. Figure 3.15 Agreement Statistics Repeated Measures Repeated Measures declares that the columns reflect responses made by the same individual at different times, and you are interested in the changes between the times. Individual reports are displayed for each item, with a transition report at the end demonstrating the transition counts and rate matrices. As an example: 1. Select Help > Sample Data Library and open Presidential Elections.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select 1980 Winner through 2012 Winner and click Repeated Measures on the Related tab. 4. Select State and click X, Grouping Category. 5. Click OK. 6. Scroll to the bottom of the report window and open the Transition Report outline. Scroll through the responses to see how each State has voted over the years. Note that New Mexico has varied between Democratic and Republic over the years. 52 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Figure 3.16 Repeated Measures Compare Each Sample For a given response, Compare Each Sample tests whether the response probability for each of its levels differs from the response probabilities for its other levels. In simple situations, the Compare Each Sample report consists of symmetric matrices of p-values, as shown in Figure 3.17. In addition, a new row or column, entitled Compare, appears in the Crosstabs table. The Compare row is placed at the bottom of the table, or the Compare column is placed at the far right. (Whether a row or column is appended depends on whether Crosstab or Crosstab Transposed is specified.) The Compare row or column contains letter codes showing which sample levels differ significantly. For more information about the letter codes, refer to “Comparisons with Letters” on page 53. Figure 3.17 Compare Each Sample Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 53 Compare Each Cell For a given response and a given X variable, Compare Each Cell tests, for each level of the X variable, whether the response probabilities differ across the levels of the response. In other words, Compare Each Cell tests response probabilities across the cells in a given row of the Crosstabs table. The Compare Each Cell report gives p-values in a tabular format. The letters across the top indicate the response levels tested for the given level of the X variable. An example is shown in Figure 3.18. In addition, when a cell differs significantly from other cells, a letter code is inserted into the appropriate cell in the Crosstabs table. For details on the letter codes and on their placement in cells, refer to “Comparisons with Letters” on page 53. Figure 3.18 Compare Each Cell (cut off after column AE) Comparisons with Letters The Compare Each Cell, Compare Each Sample, and Mean Score Comparisons commands use a system of letters to identify sample levels. The first sample level is “A”, the second “B”. For more than 26 sample levels, numbers are appended after the letters. The letters are shown in the sample level headings when a comparison command is turned on. If two sample levels are significantly different, the letter of the sample level with lesser share is placed into the comparison cell of the other category that is significantly different. To find out if a given group of sample levels, for example “B”, is significantly different, then you have to look both in the comparison cell for column “B” for other letters, and also in all the other cells across the groups of sample levels for a “B”. Lowercase letters are also used for comparisons that are slightly less significant, according to Table 3.4. These comparisons suffer when the count for that sample level group (the Base Count) is small, and asterisks start to appear in the comparison cells to warn you. The comparison features are controlled by four options set in Preferences or through a script. For more information, refer to “Set Preferences” on page 55. 54 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Table 3.4 Letter Comparisons Uppercase alpha level 0.05 The significance level for which uppercase letters show differences. Lowercase alpha level 0.10 The significance level for which lowercase letters show differences. Base Count minimum ≤ 29 The count for a sample level that leads to a ** warning. Base Count warning 30 to 99 The count for a sample level that leads to a * warning. Test Options The Test Options menu on the Categorical red triangle menu has the following options depending on your selections: ChiSquare Test Choices Single responses are tested with a chi-square test of homogeneity; either the Likelihood Ratio Chi-square or the Pearson Chi-square, or both. Options are: Both LR and Pearson, LR Only, or Pearson Only. You can set an option in Preferences. Show Warnings Shows warnings for chi-square tests related to small sample sizes. Reorders the reports so that the most significant reports are at the top. This option only applies to reports with one homogeneity test. Order by Significance Suppresses reports that are deemed non-significant. This option only applies to reports with one homogeneity test. Hide Nonsignificant Save Tables The Save Tables menu on the Categorical red triangle menu has the following options depending on your selections: Save Frequencies Saves the Frequency report to a new data table, without the marginal totals or supercategories. Save Share of Responses Saves the Share of Responses report to a new data table, without the marginal totals. Save Rate Per Case Saves the Rate Per Case report to a new data table, without the marginal totals. Save Transposed Frequencies Saves the Transposed Freq Chart report to a new data table, without the marginal totals. Save Transposed Share of Responses Saves a transposed version of the Share of Responses report to a new data table Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options Save Transposed Rate Per Case 55 Saves a transposed version of the Rate Per Case report to a new data table. Save Test Rates Saves the results of the Test Multiple Response option to a new data table. Save Test Homogeneity Saves the results of the Test Response Homogeneity option to a new data table. Save Mean Scores Saves the mean scores for each sample group in a new data table. Save tTests and pValues Save t-tests and p-values from the Mean Score Comparisons report in a new data table. Creates a Microsoft Excel spreadsheet with the structure of the crosstab-format report. The option maps all of the tables to one sheet, with the response categories as rows, the sample levels as columns, sharing the headings for sample levels across multiple tables. When there are multiple elements in each table cell, you have the option to make them multiple or single cells in Microsoft Excel. Save Excel File Set Preferences You can specify settings and set preferences within the Categorical platform. Several options are available on the launch window and can be specified before the analysis. Some of the options can also be selected from the Categorical red triangle menu, and have the effect of rerunning the analysis with the new setting. The options are initialized to the current state. Select the appropriate options and select either Submit Platform Preferences or Create Platform Preference Script to submit the options to your preferences as the new default. When the Categorical platform is launched, the preferences associated with the current preference set are enacted. Preferences can be administered and shared through a script. The best way to share a preference set widely is to create an add-in, so that if the preference settings are reset to the initial state, the add-in could restore the preferred set. 56 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research Figure 3.19 Set Preferences Window Free Text Report Options Free Text is used for comment fields where the analysis counts the frequency of each word used. Free Text gives word counts in both word order and frequency order, and the rate of non-empty text. The following example uses the Consumer Preferences.jmp sample data table, which contains survey data relating to oral hygiene preferences. A comment field was included in the survey asking for reasons why the participant did not floss. 1. Select Help > Sample Data Library and open Consumer Preferences.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select Reasons Not to Floss and click Free Text on the Multiple tab. 4. Click OK. 5. Select Score Words by Column from the first Free Text Word Counts for Reasons Not to Floss red triangle menu and then select Floss. Click OK. Chapter 3 Consumer Research Categorical Response Analysis Categorical Platform Options 57 Figure 3.20 Free Text Report Example Figure 3.20 details the free text word counts the respondents included as reasons why they do not floss. From the analysis, you can determine the number of words, cases, non-empty cases, and portions of non-empty cases. You can also view the word counts alphabetically, in terms of frequency, or by the mean scores. There are more commands to further customize the analysis on the Free Text red triangle menu on the report: Calculates for each word the average score for another column for the rows that the word appears. If you save a word table later, it will have these scores in the saved data table, plus a Treemap script to colorize by these scores. Score Words by Column Prompts you for the number of words to make indicators for, and then creates the indicator columns for the most frequent words in the data table indicating if that word appeared. Save Indicators for Most Frequent Words Creates a new table of all the words, their frequency, and the scores with respect to any of the columns scored. Save Word Table Remove Removes the table from the report. Structured Report Options The Structured tab enables you to construct complex tables of descriptive statistics by dragging column names into green icon drop zones to create side-by-side and nested results. The following example uses the Consumer Preferences.jmp sample data table. From this data, suppose that you wanted to compare job satisfaction and salary against gender by age group and position tenure. 1. Select Help > Sample Data Library and open Consumer Preferences.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select the Structured tab. 4. Drag Gender to the green drop zone at the Top of the table on the Structured tab. 5. Drag Age Group to the green drop zone just below Gender. 6. Drag Position Tenure to the green drop zone at the Top of the table next to Gender. 7. Drag Job Satisfaction to the green drop zone at the Side of the table. 58 Categorical Response Analysis Categorical Platform Options Chapter 3 Consumer Research 8. Drag Salary Group to the green drop zone at the Side of the table under Job Satisfaction. Figure 3.21 Structured Tab Report Setup 9. Click Add=>. 10. Click OK. Figure 3.22 Structured Tab Report Example Figure 3.22 shows that the majority of both the male and female respondents were somewhat satisfied with their jobs, with the highest percentage of males being in the 25-29 age group, while the females were in the 30-34 age group. Most of those who were somewhat satisfied had been in their current position for less than 5 years. The following options are available from the structured report’s red triangle menu: Forces the table to display the column letter IDs, which usually come out automatically when you do a compare command. Show Letters Specify Comparison Groups Enables you to specify groups when the group of sample levels that you want to test and compare are not the same as the innermost term’s structure. To use this option, you must look at the letter IDs, and then enter sets of letter IDs, separated Chapter 3 Consumer Research Categorical Response Analysis Additional Examples of the Categorical Platform 59 by a slash, representing each group, separating multiple groups from each other by commas. For example, the default grouping might be “A/B/C, E/D/F”, but you want to test A with E, B with D and C with F, so you specify the groups as “A/E, B/D, C/F”. This determines which letters appear in the comparison fields. In addition, a summary report shows the overall tests for each column group. Remove Removes the table from the report. Additional Examples of the Categorical Platform The following examples come from testing a fabrication line on three different occasions under two different conditions. Each set of operating conditions yielded 50 data points. Inspectors recorded the following types of defects: • contamination • corrosion • doping • metallization • miscellaneous • oxide defect • silicon defect Each unit could have several defects or even several defects of the same kind. We illustrate the data in a variety of different examples all within the Categorical platform. Multiple Response Suppose that the defects for each unit are entered via a web page, but because each unit rarely has more than three defect types, the form has three fields to enter any of the defect types for a unit, as in Failure3MultipleField.jmp. 1. Select Help > Sample Data Library and open Quality Control/Failure3MultipleField.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select Failure1, Failure2, and Failure3 and click Multiple Response on the Multiple tab. These columns contain defect types and are the variables that you want to inspect. 4. Select clean and date and click X, Grouping Category. 5. Click OK. Figure 3.23 lists failure types and counts for each failure type from the Multiple Response analysis. 60 Categorical Response Analysis Additional Examples of the Categorical Platform Chapter 3 Consumer Research Figure 3.23 Multiple Response Failures from Failure3MultipleField.jmp Response Frequencies Suppose the data have columns containing frequency counts for each batch and a column showing the total number of units of the batch, as in Failure3Freq.jmp. Figure 3.24 Failure3Freq.jmp Data Table 1. Select Help > Sample Data Library and open Quality Control/Failure3Freq.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select the frequency variables (contamination, corrosion, doping, metallization, miscellaneous, oxide defect, silicon defect) and click Response Frequencies on the Multiple tab. 4. Select clean and date and click X, Grouping Category. 5. Select Sample Size and click Sample Size. 6. Click OK. The resulting output in Figure 3.25 shows a frequency count table, with a separate column for each of the seven batches. The last two columns show the total number of defects (Total Responses) and cases (Total Cases). Chapter 3 Consumer Research Categorical Response Analysis Additional Examples of the Categorical Platform 61 Figure 3.25 Defect Rate Output Each Frequency Group contains the following information: • The total number of defects for each defect type. For example, after cleaning on Oct 1st, there were 12 contamination defects. • The share of responses. For example, after cleaning on Oct 1st, the 12 contamination defects were (12/23) accounting for 52.2% of all defects. • The rate per case. For example, after cleaning on Oct 1st, the 12 contamination defects are from 50 units (12/50) making the rate per unit 24%. Indicator Group In some cases, the data is not yet summarized, so there are individual records for each unit. We illustrate this situation with the data table, Failures3Indicators.jmp. 62 Categorical Response Analysis Additional Examples of the Categorical Platform Chapter 3 Consumer Research Figure 3.26 Failure3Indicators.jmp Data Table 1. Select Help > Sample Data Library and open Quality Control/Failures3Indicators.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select the defect columns (contamination, corrosion, doping, metallization, miscellaneous, oxide defect, silicon defect) and click Indicator Group on the Multiple tab. 4. Select clean and date and click X, Grouping Category. 5. Click OK. When you click OK, you get the same output as in the Response Group example (Figure 3.25). Multiple Delimited Suppose that an inspector entered the observed defects for each unit. The defects are listed in a single column, delimited by a comma, as in Failures3Delimited.jmp. Note in the partial data table, shown below, that some units did not have any observed defects, so the failureS column is empty. Chapter 3 Consumer Research Categorical Response Analysis Additional Examples of the Categorical Platform 63 Figure 3.27 Failure3Delimited.jmp Data Table 1. Select Help > Sample Data Library and open Quality Control/ Failures3Delimited.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select failureS and click Multiple Delimited on the Multiple tab. 4. Select clean and date and click X, Grouping Category. 5. Select ID and click ID. 6. Click OK. When you click OK, you get the same output as in Figure 3.25. Note: If more than one delimited column is specified, separate analyses are produced for each column. Multiple Response by ID Suppose each failure type is a separate record, with an ID column that can be used to link together different defect types for each unit, as in Failure3ID.jmp. Figure 3.28 Failure3ID.jmp Data Table 64 Categorical Response Analysis Additional Examples of the Categorical Platform Chapter 3 Consumer Research 1. Select Help > Sample Data Library and open Quality Control/ Failure3ID.jmp. 2. Select Analyze > Consumer Research > Categorical. 3. Select failure and click Multiple Response by ID on the Multiple tab. 4. Select clean and date and click X, Grouping Category. 5. Select SampleSize and click Sample Size. 6. Select N and click Freq. 7. Select ID and click ID. 8. Click OK. When you click OK, you get the same output as in Figure 3.25. Mean Score Example You can calculate response means in your data using Value Scores. To make the Mean Score interpretable, you can assign specific value scores in the Column Info window with the Value Scores column property. For more information about column properties, refer to Using JMP. In this example, you can assign Value Scores to calculate the Net Promoter Score (Reichheld, HBR 2003), which summarizes an 11-level rating with a favorability score between -100 and 100. Anything with a value of 6 or below is regarded as a detractor. 1. Run the following script: New Table("Rating Example", Add Rows(300), New Script("Categorical",Categorical(Responses(:Rating),Mean Score(1))), New Column("Rating",Numeric,Ordinal, Set Property( "Value Scores", {0=-100,1=-100,2=-100,3=-100,4=-100,5=-100,6=-100,7=0,8=0,9=100,10=100}), Formula(Random Category( 0.05,0,0.05,1,0.05,2,0.05,3,0.05,4, 0.05,5,0.05,6,0.05,7,0.05,8,0.3,9,0.25,10)), Set Selected ) ); 2. A data table with 300 rows of random rating data is created. Value scores were also defined for the Rating column. To view the scores, right-click the Rating column and select Column Properties > Value Scores. Chapter 3 Consumer Research Categorical Response Analysis Additional Examples of the Categorical Platform 65 Figure 3.29 Column Properties - Value Scores 3. Select Analyze > Consumer Research > Categorical. 4. Select Rating and click Responses on the Simple tab. 5. Click OK. 6. Select Mean Score from the Categorical red triangle menu. Figure 3.30 Rating Example Report Based on the defined value scores, a mean score of 18 was determined. Your results might be different as the Rating column values are random. 66 Categorical Response Analysis Additional Examples of the Categorical Platform Chapter 3 Consumer Research Chapter 4 Factor Analysis Identify Factors within Variables Factor analysis (also known as common factor analysis and exploratory factor analysis) seeks to describe a collection of observed variables in terms of a smaller collection of (unobservable) latent variables, or factors. These factors, which are defined as linear combinations of the observed variables, are constructed to explain variation that is common to the observed variables. A primary goal of factor analysis is to achieve a meaningful interpretation of the observed variables through the factors. Another goal is to reduce the number of variables. Factor analysis is used in many areas, and is of particular value in psychology, sociology, and education. In these areas, factor analysis is used to understand how manifest behavior can be interpreted in terms of underlying patterns and structures. For example, measures of participation in outdoor activities, hobbies, exercise, and travel, may all relate to a factor that can be described as “active versus inactive personality type”. Factor analysis attempts to explain correlations among the observed variables in terms of the factor. In particular, it allows you to determine how much of the variance in each observable variable is accounted for by the factors you have identified. It also tells you how much of the variance in all the variables is accounted for by each factor. Use factor analysis when you need to explore or interpret underlying patterns and structure in your data. Also consider using it to summarize the information in your variables using a smaller number of latent variables. Figure 4.1 Rotated Factor Loading Contents Factor Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Example of the Factor Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Launch the Factor Analysis Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 The Factor Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Model Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Rotation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Factor Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Factor Analysis Model Fit Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Chapter 4 Consumer Research Factor Analysis Factor Analysis Platform Overview 69 Factor Analysis Platform Overview Factor analysis models a set of observable variables in terms of a smaller number of unobservable factors. These factors account for the correlation or covariance between the observed variables. Once the factors are extracted, you perform factor rotation in order to obtain a meaningful interpretation of the factors. Consider a situation where you have ten observed variables, X1, X2, …, X10. Suppose that you want to model these ten variables in terms of two latent factors, F1 and F2. For convenience, it is assumed that the factors are uncorrelated and that each has mean zero and variance one. The model that you want to derive is of the form: X i = β i0 + β i1 F 1 + β i2 F 2 + ε i 2 2 It follows that Var ( Xi ) = β i1 + β i2 + Var ( ε i ) . The portion of the variance of Xi that is 2 2 attributable to the factors, the common variance or communality, is β i1 + β i2 . The remaining variance, Var ( ε i ) , is the specific variance, and is considered to be unique to Xi. The Factor Analysis platform provides a Scree Plot for the eigenvalues of the correlation or covariance matrix. You can use this as a guide in determining the number of factors to extract. Alternatively, you can accept the platform's suggestion of setting the number of factors equal to the number of eigenvalues that exceed one. The platform provides two factoring methods for estimating the parameters of this model: Principal Components and Maximum Likelihood. JMP provides two options for estimating the proportion of variance contributed by common factors for each variable. These Prior Communality options impose assumptions on the diagonal of the correlation (or covariance) matrix. The Principal Components option treats the correlation matrix, which has ones on its diagonal (or the covariance matrix with variances on its diagonal), as the structure to be analyzed. The Common Factor Analysis option sets the diagonal entries to values that reflect the proportion of the variation that is shared with other variables. To support interpretability of the extracted factors, you rotate the factor structure. The Factor Analysis platform provides a variety of rotation methods that encompass both orthogonal and oblique rotations. In contrast with factor analysis which looks at common variance, principal component analysis accounts for the total variance of the observed variables. See the Principal Components chapter in the Multivariate Methods book. 70 Factor Analysis Example of the Factor Analysis Platform Example of the Factor Analysis Platform To view an example Factor Analysis report for a data table for two factors: 1. Select Help > Sample Data Library and open Solubility.jmp. 2. Select Analyze > Consumer Research > Factor Analysis. The Factor Analysis launch window appears. 3. Select all of the continuous columns and click Y, Columns. 4. Keep the default Estimation Method and Variance Scaling. 5. Click OK. The initial Factor Analysis report appears. Figure 4.2 Initial Factor Analysis Report 6. For the Model Launch, select the following options: ‒ Factoring Method as Maximum Likelihood ‒ Prior Communality as Common Factor Analysis ‒ Number of factors = 2 ‒ Rotation Method as Varimax 7. After all selections are made, click Go. The Factor Analysis report appears. Chapter 4 Consumer Research Chapter 4 Consumer Research Factor Analysis Example of the Factor Analysis Platform 71 Figure 4.3 Example Factor Analysis Report The report lists the communality estimates, variance, significance tests, rotated factor loadings, and a factor loading plot. Note that in the Factor Loading Plot, Factor 1 relates to the Carbon Tetrachloride-Chloroform-Benzene-Hexane cluster of variables, and Factor 2 relates to the Ether–1-Octanol cluster of variables. See “Factor Analysis Model Fit Options” on page 78 for details of the information shown in the report. 72 Factor Analysis The Factor Analysis Report Chapter 4 Consumer Research Launch the Factor Analysis Platform Launch the Factor Analysis platform by selecting Analyze > Consumer Research > Factor Analysis. This example uses the Solubility.jmp sample data table. Figure 4.4 Factor Analysis Launch Window Y, Columns Lists the continuous columns to be analyzed. Weight Enables you to weight the analysis to account for pre-summarized data. Identifies a column whose numeric values assign a frequency to each row in the analysis. Freq By Creates a Factor Analysis report for each value specified by the By column so that you can perform separate analyses for each group. Estimation Method Lists different methods for fitting the model. For details about the methods, see the Multivariate chapter in the Multivariate Methods book. Variance Scaling Lists the scaling methods for performing the factor analysis based on Correlations (the same as Principal Components), Covariances, or Unscaled. The Factor Analysis Report The initial Factor Analysis report shows Eigenvalues and the Scree Plot. The Eigenvalues are obtained from a principal components analysis. The Scree Plot graphs these eigenvalues. The number of factors that JMP suggests in the Model Launch equals the number of eigenvalues that exceed 1.0. Alternatively, you can use the scree plot to guide your initial choice for number of factors. The number of eigenvalues that appear before the scree plot levels out can provide an upper bound on the number of factors. Chapter 4 Consumer Research Factor Analysis The Factor Analysis Report 73 Figure 4.5 Factor Analysis Report In the example shown in Figure 4.5, the Scree Plot begins to level out after the second eigenvalue. The Eigenvalues table indicates that the first eigenvalue accounts for 79.75% of the variation and the second eigenvalue accounts for 15.75%, for a total of 95.50% of the total variation. The third eigenvalue only explains 2.33% of the variation, and the contributions from the remaining eigenvalues are negligible. Although the Number of factors box is initially set to 1, this analysis suggests that extracting 2 factors is appropriate. Model Launch To configure the Factor Analysis model, use the Model Launch section at the bottom of the Factor Analysis Report (Figure 4.6). 74 Factor Analysis The Factor Analysis Report Chapter 4 Consumer Research Figure 4.6 Model Launch The Model Launch section enables you to configure the following options: 1. Factoring method - the method for extracting factors. ‒ The Principal Components method is a computationally efficient method, but it does not allow for hypothesis testing. ‒ The Maximum Likelihood method has desirable properties and allows you to test hypotheses about the number of common factors. Note: The Maximum Likelihood method requires a positive definite correlation matrix. If your correlation matrix is not positive definite, select the Principal Components method. 2. Prior Communality - the method for estimating the proportion of variance contributed by common factors for each variable. ‒ Principal Components (diagonals = 1) sets all communalities equal to 1, indicating that 100% of each variable’s variance is shared with the other variables. Using this option with Factoring Method set to Principal Components results in principal component analysis. ‒ Common Factor Analysis (diagonals = SMC) sets the communalities equal to squared multiple correlation (SMC) coefficients. For a given variable, the SMC is the RSquare for a regression of that variable on all other variables. 3. The Number of factors (or principal components) determined by eigenvalues greater than or equal to 1.0 or from the scree plot where the graph begins to level out. Note: Alternatively, the Kaiser criterion retains those factors with eigenvalues greater than 1.0. In our example, only factor 1 would be retained for analysis. Chapter 4 Consumer Research Factor Analysis The Factor Analysis Report 75 4. The Rotation method to align the factor directions with the original variables for ease of interpretation. The default value is Varimax. See “Rotation Methods” on page 75 for a description of the available selections. 5. Click Go to generate the Factor Analysis report. Depending on the selected Variance Scaling, the appropriate factor analysis results appear. See “Factor Analysis Model Fit Options” on page 78 for details about the contents of the report. The Factor Analysis on Correlations and Factor Analysis on Unscales reports show the same information. Rotation Methods Rotations align the directions of the factors with the original variables so that the factors are more interpretable. You hope for clusters of variables that are highly correlated to define the rotated factors. After the initial extraction, the factors are uncorrelated with each other. If the factors are rotated by an orthogonal transformation, the rotated factors are also uncorrelated. If the factors are rotated by an oblique transformation, the rotated factors become correlated. Oblique rotations often produce more useful patterns than do orthogonal rotations. However, a consequence of correlated factors is that there is no single unambiguous measure of the importance of a factor in explaining a variable. For each rotation method, we used the example described in “Example of the Factor Analysis Platform” on page 70 to view the Rotated Factor Loading and Factor Loading Plot. Orthogonal Rotation Methods Table 4.1 lists the available orthogonal (that is, uncorrelated) rotation methods. Table 4.1 Orthogonal Rotation Methods Method SAS PROC FACTOR Equivalent Varimax ROTATE=ORTHOMAX with GAMMA = 1 Note: This is the default selection. Biquartimax ROTATE=ORTHOMAX with GAMMA = 0.5 Equamax ROTATE=ORTHOMAX with GAMMA = number of factors/2 Factorparsimax ROTATE=ORTHOMAX with GAMMA = number of variables 76 Factor Analysis The Factor Analysis Report Chapter 4 Consumer Research Table 4.1 Orthogonal Rotation Methods (Continued) Method SAS PROC FACTOR Equivalent Orthomax ROTATE=ORTHOMAX Or ROTATE=ORTHOMAX(p), where p is the orthomax weight or the GAMMA = value. Parsimax Quartimax Note: The default p value is 1 unless specified otherwise in the GAMMA = option. For additional information about orthomax weight, see the SAS documentation, “Simplicity Functions for Rotations.” ( nvar ( nfact – 1 ) ) ROTATE=ORTHOMAX with GAMMA = ------------------------------------------( nvar + nfact – 2 ) where nvar is the number of variables, and nfact is the number of factors. ROTATE=ORTHOMAX with GAMMA=0 Oblique Rotation Methods Table 4.2 lists the available oblique (that is, correlated) rotation methods. Table 4.2 Oblique Rotation Methods Method SAS PROC FACTOR Equivalent Biquartimin ROTATE=OBLIMIN(.5) Or ROTATE=OBLIMIN with TAU=.5 Covarimin ROTATE=OBLIMIN(1) Or ROTATE=OBLIMIN with TAU=1 Obbiquartimax ROTATE=OBBIQUARTIMAX Obequamax ROTATE=OBEQUAMAX Obfactorparsimax ROTATE=OBFACTORPARSIMAX Chapter 4 Consumer Research Factor Analysis Factor Analysis Platform Options 77 Table 4.2 Oblique Rotation Methods (Continued) Method SAS PROC FACTOR Equivalent Oblimin ROTATE=OBLIMIN, where the default p value is zero, unless specified otherwise in the TAU= option. ROTATE=OBLIMIN(p) specifies p as the oblimin weight or the TAU= value. Note: For additional information about oblimin weight, see the SAS documentation, “Simplicity Functions for Rotations.” Obparsimax ROTATE=OBPARSIMAX Obquartimax ROTATE=OBQUARTIMAX Obvarimax ROTATE=OBVARIMAX Quartimin ROTATE=OBLIMIN(0) or ROTATE=OBLIMIN with TAU=0 Promax ROTATE=PROMAX Factor Analysis Platform Options The Factor Analysis platform red triangle menu enables you to select to view or hide the following report elements: A table that indicates the total number of factors extracted based on the eigenvalues (that is, the amount of variance contributed by each factor). The table includes the percent of the total variance contributed by that factor, a bar chart illustrating the percent contribution, and the cumulative percent contributed by each successive factor. The number of eigenvalues greater than or equal to 1.0 can be taken as the number of sufficient factors for analysis. Eigenvalues A plot of the eigenvalues versus the number of components (or factors). The plot can be used to determine the number of factors that contribute to the maximum amount of variance. The point at which the plotted line levels out can be taken as the number of sufficient factors for analysis. Scree Plot Lists the Script menu options for the platform. See the JMP Platforms chapter in the Using JMP book for details. Script See Figure 4.2 on page 70 for an example. 78 Factor Analysis Factor Analysis Model Fit Options Chapter 4 Consumer Research Factor Analysis Model Fit Options After submitting the Model Launch, the model results appear. The following options are available from the Factor Analysis report’s red triangle menu. Prior Communality An initial estimate of the communality for each variable. For a given variable, this estimate is the squared multiple correlation coefficient (SMC), or RSquare, for a regression of that variable on all other variables. Note: The Prior Communality Estimates table only appears if the Common Factor Analysis (diagonals = SMC) option is selected. Figure 4.7 Prior Communality Estimates Eigenvalues Shows the eigenvalues of the reduced correlation matrix and the percent of the common variance for which they account. The reduced correlation matrix is the correlation matrix with its diagonal entries replaced by the communality estimates. The eigenvalues indicate the common variance explained by the factors. The Cum Percent can exceed 100% because the reduced correlation matrix is not necessarily positive definite and can have negative eigenvalues. Note that the table indicates the number of factors retained for analysis. The Eigenvalues option is only available when the Prior Communality option is set to Common Factor Analysis (diagonals = SMC). The communality estimates are the SMC (square multiple correlation) values. Figure 4.8 indicates that the first two factors account for 100.731% of the common variance. This pattern suggests that you may not need more that two factors to model your data. Figure 4.8 Eigenvalues of the Reduced Correlation Matrix Chapter 4 Consumer Research Factor Analysis Factor Analysis Model Fit Options 79 Unrotated Factor Loading Shows the factor loading matrix before rotation. Factor loadings measure the influence of a common factor on a variable. Because the unrotated factors are orthogonal, the factor loading matrix is the matrix of correlations between the variables and the factors. The closer the absolute value of a loading is to 1, the stronger the effect of the factor on the variable. Use the slider and value to Suppress Absolute Loading Values Less Than the specified value in the table. Suppressed values appear dimmed according to the setting specified by Dim Text. Use the Dim Text slider and value to control the table’s font transparency gradient for factor values less in absolute value than the specified Suppress Absolute Loading Values Less Than value. Note: The Suppress Absolute Loading Values Less Than value and Dim Text value are the same values used in the Rotated Factor Loading table. Changes to one loading table’s settings changes the settings in the other loading table. Figure 4.9 Unrotated Factor Loading Note: The Unrotated Factor Loading matrix is re-ordered so that variables associated with the same factor appear next to each other. Rotation Matrix Shows the calculations used for rotating the factor loading plot and the factor loading matrix. Figure 4.10 Rotation Matrix Shows the matrix to which the varimax factor pattern is rotated. This option is available only for the Promax rotation. Target Matrix 80 Factor Analysis Factor Analysis Model Fit Options Chapter 4 Consumer Research Figure 4.11 Target Matrix Factor Structure Shows the matrix of correlations between variables and common factors. If the common factors are uncorrelated. This option is available only for oblique rotations. Figure 4.12 Factor Structure Final Communality Estimates Estimates of the communalities after the factor model has been fit. When the factors are orthogonal, the final communality estimate for a variable equals the sum of the squared loadings for that variable. Figure 4.13 Final Communality Estimates Standard Score Coefficients Lists the multipliers used to convert factor values when saving rotated components as factors to the source data table. Figure 4.14 Standard Score Coefficients Variance Explained by Each Factor Gives the variance, percent, and cumulative percent, of common variance explained by each rotated factor. Chapter 4 Consumer Research Factor Analysis Factor Analysis Model Fit Options 81 Figure 4.15 Variance Explained by Each Factor Significance Test If you select Maximum Likelihood as the factoring method, the results of two Chi-square tests are provided. The first test is for H0: No common factors. This null hypothesis indicates that none of the common factors are sufficient to explain the intercorrelations among the variables. This test is Bartlett’s Test for Sphericity, whose null hypothesis is that the correlation matrix of the factors is an identity matrix (Bartlett, 1954). The second test is for H0: N factors are sufficient, where N is the specified number of factors. Rejection of this null hypothesis indicates that more factors may be required to explain the intercorrelations among the variables (Bartlett, 1954). The tests in Figure 4.16 indicate that the common factors already included in the model explain some of the intercorrelations, but that more factors are needed. Note: The Significance Test table only appears if the Maximum Likelihood factoring method option is selected. Figure 4.16 Significance Test Shows the factor loading matrix after rotation. If the rotation is orthogonal, these values are the correlations between the variables and the rotated factors. Rotated Factor Loading Use the slider and value to Suppress Absolute Loading Values Less Than the specified value in the table. Suppressed values appear dimmed according to the setting specified by Dim Text. Use the Dim Text slider and value to control the table’s font transparency gradient for factor values less in absolute value than the specified Suppress Absolute Loading Values Less Than value. Note: The Suppress Absolute Loading Values Less Than value and Dim Text value are the same values used in the Unrotated Factor Loading table. Changes to one loading table’s settings changes the settings in the other loading table. 82 Factor Analysis Factor Analysis Model Fit Options Chapter 4 Consumer Research Figure 4.17 Rotated Factor Loading Note: The Rotated Factor Loading matrix is re-ordered so that variables associated with the same factor appear next to each other. Factor Loading Plot The plot of the rotated loading factors. Figure 4.18 Factor Loading Plot Note that in the Factor Loading Plot, Factor 1 relates to the Carbon Tetrachloride-Chloroform-Benzene-Hexane cluster of variables, and Factor 2 relates to the Ether–1-Octanol cluster of variables. See the matrix of “Rotated Factor Loading” on page 81 for details. Score Plot The Score Plot graphs each factor’s calculated values in relation to the other adjusting each value for the mean and standard deviation. Chapter 4 Consumer Research Factor Analysis Factor Analysis Model Fit Options 83 Figure 4.19 Score Plot Score Plot with Imputation Imputes any missing values and creates a score plot. This option is available only if there are missing values. Save Rotated Components Saves the rotated components to the data table, with a formula for computing the components. The formula cannot evaluate rows with missing values. Save Rotated Components with Imputation Imputes missing values, and saves the rotated components to the data table. The column contains a formula for doing the imputation, and computing the rotated components. This option appears after the Factor Analysis option is used, and if there are missing values. Remove Fit Removes the fit model results from the Factor Analysis Fit Model report. This option enables you to change the Model Launch configuration for a new report. 84 Factor Analysis Factor Analysis Model Fit Options Chapter 4 Consumer Research Chapter 5 Choice Models Fit Models for Choice Experiments The Choice platform is designed for use in market research experiments, where the ultimate goal is to discover the preference structure of consumers. Then, this information is used to design products or services that have the attributes most desired by consumers. Features provided in the Choice platform include: • Ability to use information about consumer traits as well as product attributes. • Integration of data from one, two, or three sources. • Ability to use the integrated profiler to understand, visualize, and optimize the response (utility) surface. • Provides subject-level scores for segmenting or clustering your data. • Uses a special default bias-corrected maximum likelihood estimator described by Firth (1993). This method has been shown to produce better estimates and tests than MLEs without bias correction. In addition, bias-corrected MLEs ameliorate separation problems that tend to occur in logistic-type models. Refer to Heinze and Schemper (2002) for a discussion of the separation problem in logistic regression. The Choice platform is not appropriate to use for fitting models that involve: • Ranking or scoring. • Nested hierarchical choices. (PROC MDC in SAS/ETS can be used for such analysis.) Figure 5.1 Choice Platform Example Contents Choice Modeling Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Example of the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Launch the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Choice Model Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Choice Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Valuing Trade-offs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Example of Valuing Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Example of One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Example of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Special Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Default Choice Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Subject Data with Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Example of Transforming Data to Two Analysis Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Example of Transforming Data to One Analysis Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Additional Examples of Logistic Regression Using the Choice Platform. . . . . . . . . . . . . . . . . 129 Example of Logistic Regression Using the Choice Platform. . . . . . . . . . . . . . . . . . . . . . . . . 129 Example of Logistic Regression for Matched Case-Control Studies . . . . . . . . . . . . . . . . . . 132 Statistical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Chapter 5 Consumer Research Choice Models Choice Modeling Platform Overview 87 Choice Modeling Platform Overview Choice modeling, pioneered by McFadden (1974), is a powerful analytic method used to estimate the probability of individuals making a particular choice from presented alternatives. Choice modeling is also called conjoint modeling, discrete choice analysis, and conditional logistic regression. The Choice Modeling platform uses a form of conditional logistic regression. Unlike simple logistic regression, choice modeling uses a linear model to model choices based on response attributes and not solely upon subject characteristics. For example, in logistic regression, the response might be whether you buy brand A or brand B as a function of ten factors. The response could also be considered characteristics that describe you such as your age, gender, income, education, and so on. However, in choice modeling, you might be choosing between two cars that are a compound of ten attributes such as price, passenger load, number of cup holders, color, GPS device, gas mileage, anti-theft system, removable-seats, number of safety features, and insurance cost. When engineers design a product, they routinely make hundreds or thousands of small design decisions. Most of these decisions are not tested by prospective customers. Consequently, these products are not optimally designed. However, if customer testing is not too costly and test subjects (prospective customers) are readily available, it is worthwhile to test more of these decisions via consumer choice experiments. Modeling costs have recently decreased with improved product and process development techniques and methodologies. Prototyping, including pure digital prototyping, is becoming less expensive, so it is possible to evaluate the attributes and consequences of more alternatives. Another important advancement is the use of the Internet to deliver choice experiments to a wide audience. You can now inform your customers that they can have input into the design of the next product edition by completing a web survey. Choice modeling can be added to Six Sigma programs to improve consumer products. Six Sigma aims at making products better by improving the manufacturing process and ensuring greater performance and durability. But, Six Sigma programs have not addressed one very important aspect of product improvement—making the products that people actually want. Six Sigma programs often consider the Voice of the Customer and can use customer satisfaction surveys. These surveys can disclose what is wrong with the product, but they fail to identify consumer preferences with regard to specific product attributes. Choice experiments provide a tool that enables companies to gain insight for actual customer preferences. Choice modeling analysis can reveal such preferences. Market research experiments have a long history of success, but performing these experiments has been expensive, and research has previously focused on price elasticity and competitive situations. It is by using these same techniques for product design engineering where choice modeling can have the most impact. 88 Choice Models Example of the Choice Platform Chapter 5 Consumer Research Example of the Choice Platform Suppose that you are supplying pizza for an airline. You want to find pizza attributes that are optimal for the flying population. So, you have a group of frequent flyers complete a choice survey. To weigh the importance of each attribute and potential interactions between attributes, you give them a series of choices that require them to state their preference between each pair of choices. One pair of choices might be between two types of pizza that they like, or between two types of pizza that they do not like. Hence, the choice might not always be easy. This example examines pizza choices where three attributes, with two levels each, are presented to the subjects: • crust (thick or thin) • cheese (mozzarella or Monterey Jack) • topping (pepperoni or none) Suppose a subject likes thin crust with mozzarella cheese and no topping. The choices given to the subject are either a thick crust with mozzarella cheese and pepperoni topping, or a thin crust with Monterey Jack cheese and no topping. Because neither of these pizzas is ideal, the subject has to weigh which of the attributes are more important. This example uses three data tables: Pizza Profiles.jmp, Pizza Responses.jmp, and Pizza Subjects.jmp. 1. Select Help > Sample Data Library and open Pizza Profiles.jmp, Pizza Responses.jmp, and Pizza Subjects.jmp. The profile data table, Pizza Profiles.jmp, lists all the pizza choice combinations that you want to present to the subjects. Each choice combination is given an ID. For the actual survey or experiment, each subject is given four trials, where each trial consists of stating his or her preference between two choice profiles (Choice1 and Choice2). The choice profiles given for each trial are referred to as a choice set. One subject’s choice trials can be different from another subject’s trials. Refer to the Discrete Choice Designs chapter in the Design of Experiments Guide. Notice that each choice value refers to an ID value in the Profile data table that has the attribute information. Data about the subjects are stored in a separate Pizza Subjects.jmp file. This table includes a Subject ID column and characteristics of the subject. In the pizza example, the only characteristic or attribute about the Subject is Gender. Notice that the response choices and choice sets in the response data table use the ID names given in the Pizza Profiles.jmp data set. Similarly, the subject identifications in the response data table match those in the subject data table. 2. Select Analyze > Consumer Research > Choice to open the launch window. There are three separate sections for each of the data sources. Chapter 5 Consumer Research Choice Models Example of the Choice Platform 89 3. Select Select Data Table under Profile Data. A new window appears, which prompts you to specify the data table for the profile data. 4. Select Pizza Profiles.jmp. The columns from this table now populate the field under Select Columns in the Choice Dialog box. 5. Select ID for Profile ID under Pick Role Variables and Add Crust, Cheese, and Topping under Construct Model Effects. Figure 5.2 Profile Data Dialog 6. Open the Response Data section of the window. Click Select Data Table. When the Response Data Table window appears, select Pizza Responses.jmp. 7. Select Choice for the Profile ID Chosen, and Choice1 and Choice2 for the Profile ID Choices. 90 Choice Models Example of the Choice Platform Chapter 5 Consumer Research Figure 5.3 Response Data Dialog Choice1 and Choice2 are the profile ID choices given to a subject on each of four trials. The Choice column contains the chosen preference between Choice1 and Choice2. 8. Open the Subject Data section of the window. Click Select Data Table. When the Subject Data Table window appears, select Pizza Subjects.jmp. 9. Select Subject for Subject ID to identify individual subjects. 10. Add Gender under Construct Model Effects. Chapter 5 Consumer Research Choice Models Example of the Choice Platform 91 Figure 5.4 Subject Data Dialog 11. Click Run Model. Figure 5.5 Choice Model Results Response-only effects Subject-effect by Response-effect interaction Cheese is significant as main effect. Crust and Topping are significant only when interacted with Gender. Figure 5.5 shows the parameter estimates and the likelihood ratio tests for the Choice Model with subject effects included. Strong interactions are seen between Gender and Crust and between Gender and Topping. When the Crust and Topping factors are assessed 92 Choice Models Example of the Choice Platform Chapter 5 Consumer Research for the entire population, the effects are not significant. However, the effects of Crust and Topping are strong when they are evaluated between Gender groups. Explore the Model If you are concerned with selecting the best combination of factor levels, the Utility Profiler is a useful tool to explore and optimize your data. 1. From the Choice Model red triangle menu, select Utility Profiler. Figure 5.6 Utility Profiler with Female Level Factor Setting You can see that females do not particularly enjoy the default settings of thick crust with Jack cheese and pepperoni toppings. 2. Move the red line in the Gender plot to male (M). Figure 5.7 Utility Profiler with Male Level Factor Setting You can see that males prefer the default pizza more than females do, but not by much. To identify the optimal factor level settings, you must find their maximum desirability settings. 3. Click the red triangle menu of the Utility Profiler and select Desirability Functions. A new row is added to the Utility Profiler, displaying overall desirability traces and measures. Utility and Desirability Functions are shown together in Figure 5.8. Chapter 5 Consumer Research Choice Models Launch the Choice Platform 93 Figure 5.8 Utility and Desirability Functions 4. From the red triangle menu of the Utility Profiler, select Maximize for each Grid Point. A Remembered Settings table containing the grid settings with the maximum utility and desirability functions, and a table of differences between grids is displayed. See Figure 5.9. As illustrated, this feature can be a very quick and useful tool for selecting the most desirable attribute combinations for a factor. Figure 5.9 Utility and Desirability Settings The grid setting for females shows that the greatest utility and desirability values are obtained when the pizza attributes are thin crust, mozzarella cheese, and no topping. For males, the grid setting shows that the highest utility and desirability values are obtained when the pizza attributes are thick crust, mozzarella cheese, and pepperoni topping. If you could offer two pizzas choices, these optimal settings for females and males would be the best to offer. Launch the Choice Platform Launch the Choice platform by selecting Analyze > Consumer Research > Choice. The Choice platform is unique because it is designed to use data from one, two or three different data tables. Typically, data are in three separate tables: Profile Data, Response Data, and Subject Data. Each data table contains information that is joined together by the platform 94 Choice Models Launch the Choice Platform Chapter 5 Consumer Research to extract the necessary data for Choice analysis. There are three sections of the Choice Launch Window. Each section corresponds to a different data table. If all your data are contained in one table, you can use the Choice platform, but additional effort is necessary. See the section “One-Table Analysis” on page 112. Because the Choice platform can use several data tables, no initial assumption is made about using the current data table—as is the case with other JMP platforms. You must select the data table for each of the three choice data sources. You are prompted to select the profile data set and the response data set. If you want to model subject attributes, then a subject data set must also be selected. You can expand or collapse each section of the Choice window, as needed. Profile Data Profile data describe the attributes associated with each choice. Each choice can comprise many different attributes, and each attribute is listed as a column in the data table. There is a row for each possible choice, and each possible choice contains a unique ID. Figure 5.10 illustrates an example Profile Data table. Figure 5.10 Example of Profile Data Table and Dialog Window Chapter 5 Consumer Research Choice Models Launch the Choice Platform 95 Link to profile data table. You can select from any of the data sets already open in the current JMP session, or you can select Other. Selecting Other enables you to open a file that is not currently open. Select Data Table Identifier for each row of choice combinations. If the Profile ID column does not uniquely identify each row in the profile data table, you need to add Grouping columns. Add Grouping columns until the combination of Grouping and Profile ID columns uniquely identify the row, or profile. Profile ID With the Profile ID, unique identifier of each row, or profile. For example, if Profile ID = 1 for Survey = A, and a different Profile ID = 1 for Survey = B, then Survey would be used as a Grouping column. Grouping For information about the Construct Model Effects window, see the Construct Model Effects section in the Introduction to Fit Model chapter of the Fitting Linear Models book. Response Data Response data contain the experimental results and have the choice set IDs for each trial as well as the actual choice selected by the subject. Each subject usually has several trials, or choice sets, to cover several choice possibilities. There can be more than one row of data for each subject. For example, an experiment might have 100 subjects with each subject making 12 choice decisions, resulting in 1200 rows in this data table. The Response data are linked to the Profile data through the choice set columns and the actual choice response column. Choice set refers to the set of alternatives from which the subject makes a choice. Grouping variables are sometimes used to align choice indices when more than one group is contained within the data. 96 Choice Models Launch the Choice Platform Chapter 5 Consumer Research Figure 5.11 Example of Response Data Table and Launch Window Link to response data table. You can select from any of the data sets already open in the current JMP session, or you can select Other. Selecting Other enables you to open a file that is not currently open. Select Data Table Profile ID Chosen Grouping The Profile ID from the Profile data table of the choice the subject selected. With the Profile ID Choices, unique identifier of each choice profile. Profile ID Choices The Profile IDs of the set of possible choices. Subject ID The Subject ID from the Subject data table. The frequency of the observations. If n is the value of the Freq variable for a given row, then that row is used in computations n times. If it is less than 1 or missing, then JMP does not use it to calculate any analyses. Freq Chapter 5 Consumer Research Choice Models Launch the Choice Platform 97 Weight The weights for each observation in the data table. The variable does not have to be an integer, but it is included only in analyses when its value is greater than zero. By Levels in the data to create separate analyses. Subject Data Subject data are optional, depending on whether subject effects are to be modeled. This source contains one or more attributes or characteristics of each subject and a subject identifier. The Subject data table contains the same number of rows as subjects and has an identifier column that matches a similar column in the Response data table. You can also put Subject data in the Response data table, but it is still specified as a subject table. Figure 5.12 Example of Subject Data Table and Launch Window Select Data Table Link to subject data table. You can select from any of the data sets already open in the current JMP session, or you can select Other. Selecting Other enables you to open a file that is not currently open. Subject ID Unique identifier of the subject. For information about the Construct Model Effects window, see the Construct Model Effects section in the Introduction to Fit Model chapter of the Fitting Linear Models book. 98 Choice Models Launch the Choice Platform Chapter 5 Consumer Research Scripting the Choice Platform If you are scripting the Choice platform, you can also set the acceptable criterion for convergence when estimating the parameters by adding this command to the Choice() specification: Choice( ..., Convergence Criterion( fraction ), ... ) See the JMP Scripting Index in the Help menu for an example. Choice Model Output An example of the Choice Model Output results is shown in Figure 5.13. • The resulting parameter estimates are sometimes referred to as part-worths. Each part-worth is the coefficient of utility associated with that attribute. By default, these estimates are based on the Firth bias-corrected maximum likelihood estimators and therefore are considered to be more accurate than MLEs without bias correction. • Comparison criteria are used to help determine the better-fitting model(s) when more than one model is investigated for your data. The model with the lower or lowest criterion value is believed to be the better or best model. Criteria are shown in the Choice Model output and include AICc (corrected Akaike’s Information Criterion), BIC (Bayesian Information Criterion), − 2 LogLikelihood, and − 2 Firth Loglikelihood. The AICc formula is: 2k ( k + 1 ) AICc = – 2loglikelihood + 2k + ----------------------n–k–1 where k is the number of estimated parameters in the model and n is the number of observations in the data set. The BIC formula is: − 2 LogLikelihood + k ∗ ln(n), where k parameters is fitted to data with n observations and LogLikelihood is the maximized log-likelihood. Note that the − 2 Firth Loglikelihood result is included only in the report when the Firth Bias-adjusted Estimates check box is checked in the launch window. (See Figure 5.10.) This option is checked by default. The decision to use or not use the Firth Bias-adjusted Estimates does not affect the AICc score or the − 2 LogLikelihood results. • Likelihood ratio tests appear for each effect in the model. These results are obtained by default if the model is fit quickly (less than five seconds). Otherwise, you can select the Choice Model red triangle menu and select Likelihood Ratio Tests. Chapter 5 Consumer Research Choice Models Choice Platform Options 99 Figure 5.13 Choice Model Results with No Subject Data for Pizza Example Part-worths for crust, cheese, and topping Choice Platform Options The Choice Modeling platform has many available options. To access these options, select the Choice Model red triangle menu. Tests the significance of each effect in the model. These are done by default if the estimate of CPU time is less than five seconds. Likelihood Ratio Tests Tests each factor in the model by constructing a likelihood ratio test for all the effects involving that factor. For more information about Joint Factor Tests, see the Standard Least Squares Report and Options chapter of the Fitting Linear Models. Joint Factor Tests Confidence Intervals Produces a 95% confidence interval for each parameter (by default), using the profile-likelihood method. Shift-click the platform red triangle menu and select Confidence Intervals to input alpha values other than 0.05. Correlation of Estimates Shows the correlations of the parameter estimates. Shows the fitted utility values for different levels in the effects, with neutral values used for unrelated factors. Effect Marginals also includes marginal probabilities. The marginal probability is the probability of an individual selecting attribute factor A over factor B with all other attributes at their mean or default levels. For example, the marginal probability of any subject choosing a pizza with mozzarella cheese, thick crust and pepperoni over that same pizza with Monterey Jack cheese instead of mozzarella is 0.9470. Effect Marginals 100 Choice Models Choice Platform Options Chapter 5 Consumer Research Figure 5.14 Example of Marginal Effects Utility Profiler Shows the predicted utility for different factor settings. The utility is the value predicted by the linear model. See “Explore the Model” on page 92 for an example of the Utility Profiler. For details about the Utility Profiler options, see the Prediction Profiler Options section in the Profiler chapter of the Profilers book. Probability Profiler Shows the predicted probability for choosing the given factor settings compared with baseline factor settings. This predicted probability is defined as ( exp ( U ) ) ⁄ ( exp ( U ) + exp ( U b ) ) , where U is the utility for the current settings and Ub is the utility for the baseline settings. See “Compare to Baseline” on page 108 for an example of using the Probability Profiler. For details about the Probability Profiler options, see the Prediction Profiler Options section in the Profiler chapter of the Profilers book. Multiple Choice Profiler Enables you to set up a number of choice sets and see the probabilities of choosing each set relative to the other choices. See “Multiple Choice Comparison” on page 110 for an example of using the Multiple Choice Profiler. For details about the Multiple Choice Profiler options, see the Prediction Profiler Options section in the Profiler chapter of the Profilers book. Performs comparisons between specific alternative choice profiles. Enables you to select factor values and the values that you want to compare. From here you can compare specific configurations, including comparing all settings on the left or right by selecting the Any check boxes. Using Any does not compare all combinations across features, but rather all combinations of comparisons, one feature at a time, using the left settings as the settings for the other factors. Comparisons Chapter 5 Consumer Research Choice Models Choice Platform Options 101 Figure 5.15 Comparisons Example Calculates how much a price must change allowing for the new feature settings to produce the same predicted outcome. The result is calculated using the Baseline settings (for each background setting) and then determining the outcome after altering the Role, including. Willingness to Pay ‒ Feature Factor - a feature in the experiment that you want to price. ‒ Price Factor - a continuous price factor in the experiment. ‒ Background Constant - something that you want to hold constant at a baseline value. ‒ Background Variable - something that you want to iterate across values. Figure 5.16 Willingness to Pay Example The Include baseline settings in report table option adds the baseline settings with a price change of zero, which is useful if you make an output table of these prices displaying all the baseline settings as well as the featured settings. 102 Choice Models Valuing Trade-offs Chapter 5 Consumer Research Save Utility Formula Makes a new column with a formula for the utility, or linear model, that is estimated. This is in the profile data table, except if there are subject effects. In that case, it makes a new data table for the formula. This formula can be used with various profilers with subsequent analyses. For more details about the Utility Formula, see “Statistical Details” on page 134. Constructs a new table that has a row for each subject containing the average (Hessian-scaled-gradient) steps on each parameter. This corresponds to using a Lagrangian multiplier test for separating that subject from the remaining subjects. These values can later be clustered, using the built-in-script, to indicate unique market segments represented in the data. For more details, see “Segmentation” on page 115. Save Gradients by Subject Model Dialog Shows the Choice launch window, which can be used to modify and re-fit the model. You can specify new data sets, new IDs, and new model effects. Valuing Trade-offs The Choice Modeling platform is also useful for determining the relative importance of product attributes. Even if the attributes of a particular product that are important to the consumer are known, information about preference trade-offs with regard to these attributes might be unknown. By gaining such information, a market researcher or product designer is able to incorporate product features that represent the optimal trade-off from the perspective of the consumer. The advantages of this approach to product design can be found in the following example. Example of Valuing Trade-offs It is already known that four attributes are important for laptop design: hard-disk size, processor speed, battery life, and selling price. The data gathered for this study are used to determine which of four laptop attributes (Hard Disk, Speed, Battery Life, and Price) are most important. It also assesses whether there are Gender or Job differences seen with these attributes. 1. Select Help > Sample Data Library and open Laptop Profile.jmp. 2. Select Analyze > Consumer Research > Choice to open the launch window. 3. Click Select Data Table under Profile Data and select Laptop Profile.jmp. A partial listing of the Profile Data table is shown in Figure 5.17. The complete data set consists of 24 rows, 12 for Survey 1 and 12 for Survey 2. Survey and Choice Set define the grouping columns and Choice ID represents the four attributes of laptops: Hard Disk, Speed, Battery Life, and Price. Chapter 5 Consumer Research Choice Models Valuing Trade-offs 103 Figure 5.17 Profile Data Set for the Laptop Example 4. Select Choice ID for Profile ID, and Add Hard Disk, Speed, Battery Life, and Price for the model effects. 5. Select Survey and Choice Set as the Grouping columns. The Profile Data window is shown in Figure 5.18. Figure 5.18 Profile Data Dialog Box for Laptop Study 6. Click Response Data > Select Data Table > Other > OK and select Laptop Runs.jmp from the sample data library. 7. Select Response as the Profile ID Chosen, Choice1, and Choice2 as the Profile ID Choices, Survey and Choice Set as the Grouping columns, and Person as Subject ID. The Response Data window is shown in Figure 5.19. 104 Choice Models Valuing Trade-offs Figure 5.19 Response Data Dialog Box for Laptop Study 8. To run the model without subject effects, click Run Model. 9. Choose Utility Profiler from the red triangle menu. Chapter 5 Consumer Research Chapter 5 Consumer Research Choice Models Valuing Trade-offs 105 Figure 5.20 Laptop Results without Subject Effects Results of this study show that while all the factors are important, the most important factor in the laptop study is Hard Disk. The respondents prefer the larger size. Note that respondents did not think a price increase from $1000 to $1200 was important, but an increase from $1200 to $1500 was considered important. This effect is easily visualized by examining the factors interactively with the Utility Profiler. Such a finding can have implications for pricing policies, depending on external market forces. Subject Effects To include subject effect for the laptop study, simply add to the Choice Modeling window. 1. Select Model Dialog from the Choice Model red triangle menu. 2. Under Subject Data, Select Data Table > Other > OK > Laptop Subjects.jmp. 3. Select Person as Subject ID and Add Gender and Job as the model effects. The Subject Data window is shown in Figure 5.21. 106 Choice Models Valuing Trade-offs Figure 5.21 Subject Dialog Box for Laptop Study 4. Click Run Model. Results are shown in Figure 5.22, Figure 5.23, and Figure 5.26. Figure 5.22 Laptop Parameter Estimate Results with Subject Data Chapter 5 Consumer Research Chapter 5 Consumer Research Choice Models Valuing Trade-offs 107 Figure 5.23 Laptop Likelihood Ratio Test Results with Subject Data 5. Select Joint Factor Tests from the Choice Model red triangle menu The results of the Joint Factor Tests are displayed in Figure 5.24. Figure 5.24 Joint Factor Test for Laptop 6. Select Effect Marginals from the Choice Model red triangle menu. The marginal effects of each level for each factor are displayed in Figure 5.25. Notice that the marginal effects for each factor across all levels sum to zero. Figure 5.25 Marginal Effects for Laptop 108 Choice Models Valuing Trade-offs Chapter 5 Consumer Research Figure 5.26 Laptop Profiler Results for Females with Subject Data Figure 5.27 Laptop Profiler Results for Males with Subject Data The interaction effect between Gender and Hard Disk is marginally significant, with a p-value of 0.0744 (See Figure 5.23 on page 107). In the Utility Profiler, check the slope for Hard Disk for both levels of Gender. You see that the slope is steeper for females than for males. Compare to Baseline If you wanted to know the probability of a customer selecting one choice over some baseline, the Probability Profiler is useful to compare choices. Suppose you were developing a new product and wanted to see the likelihood of a customer selecting the new product over the old product or a competitor’s product. The Probability Profiler is a useful tool for this type of baseline analysis. In this example, your company is currently producing laptops with 40 GB hard drives, 1.5 GHz processors, and 6 hour battery life that costs $1,200. You are looking for a way to make your product more desirable by changing as few factors as possible. 1. From the Choice Model red triangle window, select Probability Profiler. 2. Set the Baseline to 40 GB, 1.5 GHz, 6 hours, and $1,200. 3. In the Probability Profiler, change the settings of Battery Life to 6 hours and Price to $1,200. Chapter 5 Consumer Research Choice Models Valuing Trade-offs 109 Figure 5.28 Laptop Probability Profiler Results with Baseline Effects You can see in Figure 5.28 that by increasing Hard Disk space, increasing Speed, or decreasing the price would increase the probability a subject would choose the new product over the old for females with development jobs. 4. Change the Gender effect in the Baseline to M. You can see that by decreasing price, males with development jobs are actually less likely to choose the new product over the old. If you change the Job effect in the Baseline to Marketing, you see the same effect. Looking at Figure 5.28, you can see that the steepest slope is for the Hard Disk factor. 5. Change the Gender Effect in the Baseline to F and change the Hard Disk to 80 GB in the Probability Profiler. Figure 5.29 Laptop Probability Profiler Results with More Space By increasing the space in the new laptop, a female with a development job choose the new laptop over the old laptop almost 92% of the time. You can change the subject effects to vary gender and job. The probability results are displayed in Table 5.1. You can see that males in development jobs are less likely than females in the same job to choose the new laptop, they are still more likely to choose the 110 Choice Models Valuing Trade-offs Chapter 5 Consumer Research new laptop over the old. Developing a new product with increased hard disk space is more desirable to your customers than the old product. Table 5.1 Laptop Probability Profiler Probabilities Gender Job Probability of Selecting New Laptop over Old Female Development 0.915746 Female Marketing 0.938172 Male Development 0.696043 Male Marketing 0.761733 Multiple Choice Comparison If, for example, you wanted to develop a product that had an advantage over two other competitors, you could use the Multiple Choice Profiler. The Multiple Choice Profiler enables you to determine the probability of a subject choosing one choice set over various others. Company A produces a product with a fast processor speed and high battery life at a reasonable price. Company B makes the biggest hard drives with the fastest speed, but at a high price and low battery life. You currently make a low-end laptop with small hard drive, slow processor, and low battery life. You want to gain market share by increasing one area of performance and price. You want to market this product to the masses, so you want to remove the subject effects. 1. From the Choice Model red triangle window, select Model Dialog. 2. Open the Subject Data section. Click Subject Data Table. The Subject DataTable window appears, prompting you to select a data table to use. 3. Ensure that no data table is selected and click OK. By default, no data table is selected. 4. Click Run Model. 5. From the Choice Model red triangle window, select Multiple Choice Profiler. A window appears, asking for the number of alternative choices to profile. The default number 3 is already entered in the window. 6. Click OK. Three Alternative profilers appear. Each factor in each profiler is set to their default values. Alternative 1 indicates the product that you want to develop. Alternative 2 indicates Company A’s product. Alternative 3 indicates Company B’s product. 7. For Alternative 2, set Hard Disk to 40 GB, Speed to 2.0 GHz, Battery Life to 6 hours, and Price to $1,200. Chapter 5 Consumer Research Choice Models Valuing Trade-offs 111 8. For Alternative 3, set Hard Disk to 80 GB, Speed to 2.0 GHz, Battery Life to 4 hours, and Price to $1,500. Figure 5.30 Multiple Choice Profiler with Laptop Data You can see that with your company’s settings set to the low, default levels, Company B has the greatest Share of 0.51364. This Share means that the probability of any subject choosing Alternative 3 is 51.4%. It is obvious that with your company’s settings set how they are, very few people buy your product. 112 Choice Models One-Table Analysis Chapter 5 Consumer Research You want to increase your market share by upgrading your company’s laptop in one of the performance areas and increasing price. The slope of the line in Alternative 1’s Hard Disk profile suggests increasing hard disk space will increase market share the most. 9. For Alternative, set Hard Disk to 80 GB and Price to $1,200. Figure 5.31 Multiple Choice Profiler with Improved Laptop By increasing hard disk space, you can increase the price of your laptop and expect a market share of about 36%. This share is almost the same as Company B’s high-performance laptop and is much better than the market share with the initial low-end settings seen in Figure 5.30. You can see in Figure 5.31 that you could increase your market share even more by increasing Speed or Battery Life. If you wanted to determine what laptop configuration would give you the maximum market share, you could maximize desirability using the Choice Model red triangle menu options. In this case, however, it is fairly obvious that a high-performance laptop at a low price would be the most desirable. One-Table Analysis The Choice Modeling platform can also be used if all of your data are in one table. For this one-table scenario, you use only the Profile Data section of the Choice Dialog box. Subject-specific terms can be used in the model, but not as main effects. Two advantages, both offering more model-effect flexibility than the three-table specification, are realized by using a one-table analysis: • Interactions can be selectively chosen instead of automatically getting all possible interactions between subject and profile effects as seen when using three tables. • Unusual combinations of choice sets are allowed. This means, for example, that the first trial can have a choice set of two, the second trial can consist of a choice set of three, the third trial can have a choice set of five, and so on. With multiple tables, in contrast, it is assumed that the number of choices for each trial is fixed. Chapter 5 Consumer Research Choice Models One-Table Analysis 113 A choice response consists of a set of rows, uniquely identified by the Grouping columns. An indicator column is specified for Profile ID in the Choice Dialog box. This indicator variable uses the value of 1 for the chosen profile row and 0 elsewhere. There must be exactly one “1” for each Grouping combination. Example of One-Table Analysis This example illustrates how the pizza data are organized for the one-table situation. Figure 5.32 shows a subset of the combined pizza data. 1. Select Help > Sample Data Library and open Pizza Combined.jmp to see the complete table. Each subject completes four choice sets, with each choice set or trial consisting of two choices. For this example, each subject has eight rows in the data set. The indicator variable specifies the chosen profile for each choice set. The columns Subject and Trial together identify the choice set, so they are the Grouping columns. Figure 5.32 Partial Listing of Combined Pizza Data for One-Table Analysis To analyze the data in this format, open the Profile Data section in the Choice Dialog box, shown in Figure 5.33. 2. Select Analyze > Consumer Research > Choice. 3. Click Select Data Table. 4. Specify Pizza Combined.jmp as the data set. Click OK. 5. In the Profile Data section of the window, specify Indicator as the Profile ID, Subject and Trial as the Grouping variables, and add Crust, Cheese, and Topping as the main effects. See Figure 5.33. 114 Choice Models One-Table Analysis Chapter 5 Consumer Research Figure 5.33 Choice Dialog Box for Pizza Data One-Table Analysis 6. Click Run Model. A new window appears, asking if this is a one-table analysis with all of the data in the Profile Table. 7. Click Yes to fit the model, as shown in Figure 5.34. Figure 5.34 Choice Model for Pizza Data One-Table Analysis 8. Select Utility Profiler from the drop-down menu to obtain the results shown in Figure 5.35. Notice that the parameter estimates and the likelihood ratio test results are identical to the results obtained for the Choice Model with only two tables, shown in Figure 5.13. Chapter 5 Consumer Research Choice Models Segmentation 115 Figure 5.35 Utility Profiler for Pizza Data One-Table Analysis Segmentation Market researchers sometimes want to analyze the preference structure for each subject separately in order to see whether there are groups of subjects that behave differently. However, there are usually not enough data to do this with ordinary estimates. If there are sufficient data, you can specify “By groups” in the Response Data or you could introduce a Subject identifier as a subject-side model term. This approach, however, is costly if the number of subjects is large. Other segmentation techniques discussed in the literature include Bayesian and mixture methods. You can also use JMP to segment by clustering subjects using response data. For example, after running the model using the Pizza Profiles.jmp, Pizza Responses.jmp, and the optional Pizza Subjects.jmp data sets, select the red triangle menu for the Choice Model platform and select Save Gradients by Subject. A new data table is created containing the average Hessian-scaled gradient on each parameter, and there is one row for each subject. Note: This feature is regarded as an experimental method, because, in practice, little research has been conducted on its effectiveness. These gradient values are the subject-aggregated Newton-Raphson steps from the optimization used to produce the estimates. At the estimates, the total gradient is zero, and Δ = H-1g = 0, where g is the total gradient of the log-likelihood evaluated at the MLE, and H-1 is the inverse Hessian function or the inverse of the negative of the second partial derivative of the log-likelihood. But, the disaggregation of Δ results in Δ = ΣijΔij = ΣH-1gij = 0, where i is the subject index, j is the choice response index for each subject, Δij are the partial Newton-Raphson steps for each run, and gij is the gradient of the log-likelihood by run. The mean gradient step for each subject is then calculated as: 116 Choice Models Segmentation Chapter 5 Consumer Research Δ ij Δ i = Σ j ------- , ni where ni is the number of runs per subject. These Δi are related to the force that subject i is applying to the parameters. If groups of subjects have truly different preference structures, these forces are strong, and they can be used to cluster the subjects. The Δi are the gradient forces that are saved. You can then cluster these values using the Clustering platform. Example of Segmentation Set Up the Model 1. Select Help > Sample Data Library and open Pizza Profiles.jmp, Pizza Responses.jmp, and Pizza Subjects.jmp. 2. Select Analyze > Consumer Research > Choice to open the launch window. There are three separate sections for each of the data sources. 3. Select Select Data Table under Profile Data. You are prompted to select the data table for the profile data. 4. Select Pizza Profiles.jmp. The columns from this table now populate the field under Select Columns in the Choice Dialog box. 5. Select ID and click Profile ID. 6. Select Crust, Cheese, and Topping and click Add. 7. Open the Response Data section of the window and click Select Data Table. 8. Select Pizza Responses.jmp. 9. Select Choice and click Profile ID Chosen 10. Select Choice1 and Choice2 and click Profile ID Choices. 11. Select Subject and click Subject ID. 12. Open the Subject Data section of the window and click Select Data Table. 13. Select Pizza Subjects.jmp. 14. Select Subject and click Subject ID to identify individual subjects. 15. Select Gender and click Add. 16. Click Run Model. The Choice Model report appears. Chapter 5 Consumer Research Choice Models Segmentation 117 Cluster Subjects 1. In the Choice Model red triangle menu, select Save Gradients by Subject. A new data table is generated, with gradient forces saved for each main effect and subject interaction. A partial data table with these subject gradient forces is shown in Figure 5.36. Figure 5.36 Gradients by Subject for Pizza Data 2. In the new data table, click the red triangle menu of the HierarchicalCluster script. Click Run Script. The resulting dendrogram of the clusters is shown in Figure 5.37. 118 Choice Models Segmentation Chapter 5 Consumer Research Figure 5.37 Dendrogram of Subject Clusters for Pizza Data 3. Save the cluster IDs by clicking on the red triangle menu of Hierarchical Clustering and selecting Save Clusters. A new column called Cluster is created in the data table containing the gradients. Each subject has been assigned a Cluster value that is associated with other subjects having similar gradient forces. Refer to the Cluster platform chapter in the Multivariate Methods book for a discussion of other Hierarchical Clustering options. The gradient columns can be deleted because they were used only to obtain the clusters. 4. Select all columns except Subject and Cluster. Right-click on the selected columns and select Delete Columns. The data table shown in Figure 5.38 contains only Subject and Cluster variables. Chapter 5 Consumer Research Choice Models Segmentation 119 Figure 5.38 Merge Results Back into Original Table 5. Click Run Script under the Merge Data Back red triangle menu, as shown in the partial gradient-by-subject table in Figure 5.36. The cluster information is merged into the Subject data table. The columns in the Subject data table are now Subject, Gender, and Cluster, as shown in Figure 5.39. Figure 5.39 Subject Data with Cluster Column This table can then be used for further analysis. Fit Y by X 1. Select Analyze > Fit Y by X. 2. Specify Gender as the Y, Response, and Cluster as X, Factor. This analysis is depicted in Figure 5.40. 120 Choice Models Segmentation Chapter 5 Consumer Research Figure 5.40 Contingency Analysis of Gender by Cluster for Pizza Example Figure 5.40 shows that Cluster 1 contains half male and female, Cluster 2 is only female, and Cluster 3 is all male. If desired, you could now refit and analyze the model with the addition of the Cluster variable. Chapter 5 Consumer Research Choice Models Special Data Rules 121 Special Data Rules Default Choice Set If in every trial, you can choose any of the response profiles, you can omit the Profile ID Choices selection under Pick Role Variables in the Response Data section of the Choice Dialog Box. The Choice Model platform then assumes that all choice profiles are available on each run. Subject Data with Response Data If you have subject data in the Response data table, select this table as the Select Data Table under the Subject Data. In this case, a Subject ID column does not need to be specified. In fact, it is not used. It is generally assumed that the subject data repeats consistently in multiple runs for each subject. Logistic Regression Ordinary logistic regression can be done with the Choice Modeling platform. Note: The Fit Y by X and Fit Model platforms are more convenient to use than the Choice Modeling platform for logistic regression modeling. This section is used only to demonstrate that the Choice Modeling platform can be used for logistic regression, if desired. If your data are already in the choice-model format, you might want to use the steps given below for logistic regression analysis. However, three steps are needed: • Create a trivial Profile data table with a row for each response level. • Put the explanatory variables into the Response data. • Specify the Response data table, again, for the Subject data table. For examples of conducting Logistic Regression using the Choice Platform, see“Example of Logistic Regression Using the Choice Platform” on page 129 and “Example of Logistic Regression for Matched Case-Control Studies” on page 132. Transforming Data Although data are often in the Response/Profile/Subject form, the data are sometimes specified in another format that must be manipulated into the normalized form needed for choice analysis. 122 Choice Models Transforming Data Chapter 5 Consumer Research Example of Transforming Data to Two Analysis Tables Consider the data from Daganzo, found in Daganzo Trip.jmp. This data set contains the travel time for three transportation alternatives and the preferred transportation alternative for each subject. Add Choice Mode and Subjects 1. Select Help > Sample Data Library and open the Daganzo.jmp data table. A partial listing of the data set is shown in Figure 5.41. Figure 5.41 Partial Daganzo Travel Time Table for Three Alternatives Each Choice number listed must first be converted to one of the travel mode names. This transformation is easily done by using the Choose function in the formula editor, as follows. 2. Select Cols > New Column. 3. Specify the Column Name as Choice Mode and the modeling type as Nominal. 4. Click the Column Properties and select Formula. 5. Click Conditional under the Functions (grouped) command, select Choose, and press the comma key twice to obtain additional arguments for the function. 6. Click Choice for the Choose expression (expr), and double click each clause entry box to enter “Subway”, “Bus”, and “Car ” (with the quotation marks) as shown in Figure 5.42. Figure 5.42 Choose Function for Choice Mode Column of Daganzo Data 7. Click OK in the Formula Editor window. 8. Click OK in the New Column window. The new Choice Mode column appears in the data table. Because each row contains a choice made by each subject, another column containing a sequence of numbers should be created to identify the subjects. 9. Select Cols > New Column. Chapter 5 Consumer Research Choice Models Transforming Data 123 10. Specify the Column Name as Subject. 11. Click Missing/Empty next to Initialize Data and select Sequence Data. 12. Click OK. A partial listing of the modified table is shown in Figure 5.43. Figure 5.43 Daganzo Data with New Choice Mode and Subject Columns Stack the Data In order to construct the Profile data, each alternative needs to be expressed in a separate row. 1. Selecting Tables > Stack. 2. Select the columns Subway, Bus, and Car and click Stack Columns. 3. For the Output table name, type Stacked Daganzo. Type Travel Time for the Stacked Data Column and Mode for the Source Label Column. The resulting Stack window is shown in Figure 5.44. Figure 5.44 Stack Operation for Daganzo Data 4. Click OK. A partial view of the resulting table is shown in Figure 5.45. 124 Choice Models Transforming Data Chapter 5 Consumer Research Figure 5.45 Partial Stacked Daganzo Table Make the Profile Data Table For the Profile Data Table, you need the Subject, Mode, and Travel Time columns. 1. Select the Subject, Mode, and Travel Time columns and select Tables > Subset. 2. Select Selected Columns and click OK. A partial data table is shown in Figure 5.46. Note the default table name is Subset of Stacked Daganzo. Figure 5.46 Partial Subset Table of Stacked Daganzo Data Make the Response Data Table For the Response Data Table, you need the Subject and Choice Mode columns, but you also need a column for each possible choice. 3. With the original data table open, select the Subject and Choice Mode columns. 4. Select Tables > Subset. 5. Select Selected Columns and click OK. Note the default table name is Subset of Daganzo Trip. 6. Select Cols > Add Multiple Columns. 7. For the Column prefix, type Choice. Type 3 next to How many columns to add. Click Numeric and select Character. 8. Click OK. The columns Choice 1, Choice 2, and Choice 3 have been added. Chapter 5 Consumer Research Choice Models Transforming Data 125 9. Type “Bus” (without quotation marks) in the first row of Choice 1. Right-click the cell and select Fill > Fill to end of table. 10. Type “Subway” (without quotation marks) in the first row of Choice 2. Right-click the cell and select Fill > Fill to end of table. 11. Type “Car” (without quotation marks) in the first row of Choice 3. Right-click the cell and select Fill > Fill to end of table. The resulting table is shown in Figure 5.47. Figure 5.47 Partial Subset Table of Daganzo Data with Choice Set Fit the Model Now that you have separated the original Daganzo Trip.jmp table into two separate tables, you can run the Choice Platform. 1. Select Analyze > Consumer Research > Choice. 2. Specify the model, as shown in Figure 5.48. 126 Choice Models Transforming Data Chapter 5 Consumer Research Figure 5.48 Choice Dialog Box for Subset of Daganzo Data 3. Click Run Model. The resulting parameter estimate now expresses the utility coefficient for Travel Time and is shown in Figure 5.49. Chapter 5 Consumer Research Choice Models Transforming Data 127 Figure 5.49 Parameter Estimate for Travel Time of Daganzo Data The negative coefficient implies that increased travel time has a negative effect on consumer utility or satisfaction. The likelihood ratio test result indicates that the Choice model with the effect of Travel Time is significant. Example of Transforming Data to One Analysis Table Rather than creating two or three tables, it can be more practical to transform the data so that only one table is used. For the one-table format, the subject effect is added as in the previous example. A response indicator column is added instead of using three different columns for the choice sets (Choice 1, Choice 2, Choice 3). The transformation for the one-table scenario includes the following steps. 1. Create or open Stacked Daganzo.jmp from the “Stack the Data” steps shown in “Example of Transforming Data to Two Analysis Tables” on page 122. 2. Select Cols > New Column. 3. Type Response as the Column Name. 4. Click Column Properties and select Formula. 5. Select Conditional under the Functions (grouped) command and select If from the formula editor. 6. Select the column Choice Mode for the expression (expr). 7. Enter “=” and select Mode. 8. Type 1 for the Then Clause and 0 for the Else Clause. 9. Click OK in the Formula Editor window. Click OK in the New Column window. The completed formula should look like Figure 5.50. 128 Choice Models Transforming Data Chapter 5 Consumer Research Figure 5.50 Formula for Response Indicator for Stacked Daganzo Data 10. Select the Subject, Travel Time, and Response columns and then select Tables > Subset. 11. Select Selected Columns and click OK. A partial listing of the new data table is shown in Figure 5.51. Figure 5.51 Partial Table of Stacked Daganzo Data Subset 12. Select Analyze > Consumer Research > Choice to open the launch window and specify the model as shown in Figure 5.52. Figure 5.52 Choice Dialog Box for Subset of Stacked Daganzo Data for One-Table Analysis 13. Click Run Model. Chapter 5 Consumer Research Choice Models Additional Examples of Logistic Regression Using the Choice Platform 129 A pop-up dialog window asks whether this is a one-table analysis with all the data in the Profile Table. 14. Select Yes to obtain the parameter estimate expressing the utility Travel Time coefficient, shown in Figure 5.53. Figure 5.53 Parameter Estimate for Travel Time of Daganzo Data from One-Table Analysis Notice that the result is identical to that obtained for the two-table model, shown earlier in Figure 5.49. This chapter illustrates the use of the Choice Modeling platform with simple examples. This platform can also be used for more complex models, such as those involving more complicated transformations and interaction terms. Additional Examples of Logistic Regression Using the Choice Platform Example of Logistic Regression Using the Choice Platform Use the Choice Platform 1. Select Help > Sample Data Library and open Lung Cancer Responses.jmp. Notice this data table has only one column (Lung Cancer) with two rows (Cancer and NoCancer). 2. Select Analyze > Consumer Research > Choice > Select Data Table > Lung Cancer Responses.jmp > OK. 3. Select Lung Cancer as the Profile ID and Add Lung Cancer as the model effect. The Profile Data window is shown in Figure 5.54. 130 Choice Models Additional Examples of Logistic Regression Using the Choice Platform Chapter 5 Consumer Research Figure 5.54 Profile Data for Lung Cancer Example 4. Click the disclosure icon for Response Data > Select Data Table > Other > OK. 5. Open the sample data set Lung Cancer Choice.jmp. 6. Select Lung Cancer for Profile ID Chosen, Choice1 and Choice2 for Profile ID Choices, and Count for Freq. The Response Data launch window is shown in Figure 5.55. Figure 5.55 Response Data for Lung Cancer Example 7. Click the disclosure icon for Subject Data > Select Data Table > Lung Cancer Choice.jmp > OK. Chapter 5 Consumer Research Choice Models Additional Examples of Logistic Regression Using the Choice Platform 131 8. Add Smoker as the model effect. The Subject Data launch window is shown in Figure 5.56. Figure 5.56 Subject Data for Lung Cancer Example 9. Uncheck Firth Bias-adjusted Estimates and Run Model. Choice Modeling results are shown in Figure 5.57. Figure 5.57 Choice Modeling Logistic Regression Results for the Cancer Data Use the Fit Model Platform 1. Select Help > Sample Data Library and open Lung Cancer.jmp. 2. Select Analyze > Fit Model. Automatic specification of the columns is: Lung Cancer for Y, Count for Freq, and Smoker for Add under Construct Model Effects. The Nominal Logistic personality is automatically selected. 3. Click Run. The nominal logistic fit for the data is shown in Figure 5.58. 132 Choice Models Additional Examples of Logistic Regression Using the Choice Platform Chapter 5 Consumer Research Figure 5.58 Fit Model Nominal Logistic Regression Results for the Cancer Data Notice that the likelihood ratio chi-square test for Smoker*Lung Cancer in the Choice model matches the likelihood ratio chi-square test for Smoker in the Logistic model. The reports shown in Figure 5.57 and Figure 5.58 support the conclusion that smoking has a strong effect on developing lung cancer. See the Logistic Regression chapter in the Fitting Linear Models book for more details. Example of Logistic Regression for Matched Case-Control Studies This section provides an example using the Choice platform to perform logistic regression on the results of a study of endometrial cancer with 63 matched pairs. The data are from the Los Angeles Study of the Endometrial Cancer Data in Breslow and Day (1980) and the SAS/ STAT(R) 9.2 User's Guide, Second Edition (2006). The goal of the case-control analysis was to determine the relative risk for gallbladder disease, controlling for the effect of hypertension. The Outcome of 1 indicates the presence of endometrial cancer, and 0 indicates the control. Gallbladder and Hypertension data indicators are also 0 or 1. 1. Select Help > Sample Data Library and open Endometrial Cancer.jmp. 2. Select Analyze > Consumer Research > Choice. Chapter 5 Consumer Research Choice Models Additional Examples of Logistic Regression Using the Choice Platform 133 3. Click the Select Data Table button. 4. Select Endometrial Cancer as the profile data table. Click OK. 5. Assign Outcome to the Profile ID role. 6. Assign Pair to the Grouping role. 7. Add the following columns as model effects: Gallbladder, Hypertension. 8. Deselect the Firth Bias-Adjusted Estimates check box. 9. Select Run Model. A pop-up dialog window asks whether this is a one-table analysis with all the data in the Profile Table. 10. Click Yes. 11. On the Choice Model red triangle menu, select Utility Profiler. The report is shown in Figure 5.59. Figure 5.59 Logistic Regression on Endometrial Cancer Data Likelihood Ratio tests are given for each factor. Note that Gallbladder is nearly significant at the 0.05 level (p-value = 0.0532). Use the Utility Profiler to visualize the impact of the factors on the response. 134 Choice Models Statistical Details Chapter 5 Consumer Research Statistical Details Parameter estimates from the choice model identify consumer utility, or marginal utilities in the case of a linear utility function. Utility is the level of satisfaction consumers receive from products with specific attributes and is determined from the parameter estimates in the model. The choice statistical model is expressed as follows: Let X[k] represent a subject attribute design row, with intercept Let Z[j] represent a choice attribute design row, without intercept Then, the probability of a given choice for the k'th subject to the j'th choice of m choices is: exp ( β' ( X [ k ] ⊗ Z [ j ] ) ) P i [ jk ] = ------------------------------------------------------------------m exp ( β' ( X [ k ] ⊗ Z [ l ] ) ) l=1 where: ‒ ⊗ is the Kronecker rowwise product ‒ the numerator calculates for the j'th alternative actually chosen ‒ the denominator sums over the m choices presented to the subject for that trial Chapter 6 Uplift Models Model the Incremental Impact of Actions on Consumer Behavior Use uplift modeling to optimize marketing decisions, to define personalized medicine protocols, or, more generally, to identify characteristics of individuals who are likely to respond to an intervention. Also known as incremental modeling, true lift modeling, or net modeling, uplift modeling differs from traditional modeling techniques in that it finds the interactions between a treatment and other variables. It directs focus to individuals who are likely to react positively to an action or treatment. Figure 6.1 Example of Uplift for a Hair Product Marketing Campaign Contents Uplift Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Example of the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Launch the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 The Uplift Model Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Uplift Model Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Uplift Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Chapter 6 Consumer Research Uplift Models Uplift Platform Overview 137 Uplift Platform Overview Use the Uplift platform to model the incremental impact of an action, or treatment, on individuals. An uplift model helps identify groups of individuals who are most likely to respond to the action. Identification of these groups leads to efficient and targeted decisions that optimize resource allocation and impact on the individual. (See Radcliffe and Surry, 2011.) The Uplift platform fits partition models. While traditional partition models find splits to optimize a prediction, uplift models find splits to maximize a treatment difference. The uplift partition model accounts for the fact that some individuals receive the treatment, while others do not. It does this by fitting a linear model to each possible (binary) split. A continuous response is modeled as a linear function of the split, the treatment, and the interaction of the split and treatment. A categorical response is expressed as a logistic function of the split, the treatment, and the interaction of the split and treatment. In both cases, the interaction term measures the difference in uplift between the groups of individuals in the two splits. The criterion used by the Uplift platform in defining splits is the significance of the test for interaction over all possible splits. However, predictor selection based solely on p-values introduces bias favoring predictors with many levels. For this reason, JMP adjusts p-values to account for the number of levels. (See the paper “Monte Carlo Calibration of Distributions of Partition Statistics” on the JMP website.) The splits in the Uplift platform are determined by maximizing the adjusted p-values for t tests of the interaction effects. The logworth for each adjusted p-value, namely -log10(adj p-value), is reported. 138 Uplift Models Example of the Uplift Platform Chapter 6 Consumer Research Example of the Uplift Platform The Hair Care Product.jmp sample data table results from a marketing campaign designed to increase purchases of a hair coloring product targeting both genders. For purposes of designing the study and tracking purchases, 126,184 “club card” members of a major beauty supply chain were identified. Approximately half of these members were randomly selected and sent a promotional offer for the product. Purchases of the product over a subsequent three-month period by all club card members were tracked. The data table shows a Promotion column, indicating whether the member received promotional material. The column Purchase indicates whether the member purchased the product over the test period. For each member, the following information was assembled: Gender, Age, Hair Color (natural), U.S. Region, and Residence (whether the member is located in an urban area). Also shown is a Validation column consisting of about 33% of the subjects. For a categorical response, the Uplift platform interprets the first level in its value ordering as the response of interest. This is why the column Purchase has the Value Ordering column property. This property ensures that “Yes” responses are first in the ordering. 1. Select Help > Sample Data Library and open Hair Care Product.jmp. 2. Select Analyze > Consumer Research > Uplift. 3. From the Select Columns list: ‒ Select Promotion and click Treatment. ‒ Select Purchase and click Y, Response. ‒ Select Gender, Age, Hair Color, U.S. Region, and Residence, and click X, Factor. ‒ Select Validation and click Validation. 4. Click OK. 5. Below the Graph in the report that appears, click Go. Based on the validation set, the optimal Number of Splits is determined to be three. The Graph is shown in Figure 6.2. Note that the vertical scale has been modified in order to show the detail. Chapter 6 Consumer Research Uplift Models Example of the Uplift Platform 139 Figure 6.2 Graph after Three Splits The graph indicates that uplift in purchases occurs for females with black, red, or brown hair and for younger females (Age < 42) with blond hair. For older blond-haired women (Age ≥ 42) and males, the promotion has a negative effect. Launch the Uplift Platform To launch the Uplift platform, select Analyze > Consumer Research > Uplift. Figure 6.3 shows a launch window for the Hair Care Product.jmp sample data table. The columns that you enter for Y, Response, and X, Factor can be continuous or categorical. In typical usage, the Treatment column is categorical, and often has only two levels. If your Treatment column contains more than two levels, the first level is treated as Treatment1 and the remaining levels are combined in Treatment2. 140 Uplift Models The Uplift Model Report Chapter 6 Consumer Research Figure 6.3 Launch Window for Uplift You can specify your own Validation column, or designate a random portion of your data to be selected as a Validation Portion. If you click the Validation button with no columns selected in the Select Columns list, you can add a validation column to your data table. For more information about the Make Validation Column utility, see Basic Analysis. Note that the only Method currently supported by Uplift is Decision Tree. The Uplift Model Report The report opens by showing the Graph and the initial node of the Tree, as well as controls for splitting. Uplift Model Graph The graph represents the response on the vertical axis. The horizontal axis corresponds to observations, arranged by nodes. For each node, a black horizontal line shows the mean response. Within each split, there is a subsplit for treatment shown by a red or blue line. These lines indicate the mean responses for each of the two treatment groups within the split. The value ordering of the treatment column determines the placement order of these lines. As nodes are split, the graph updates to show the splits beneath the horizontal axis. Vertical lines divide the splits. Beneath the graph are the control buttons: Split, Prune, and Go. The Go button only appears if there is a validation set. Also shown is the name of the Treatment column and its two levels, called Treatment1 and Treatment2. If more than two levels are specified for the Treatment column, all but the first level are treated as a single level and combined into Treatment2. Chapter 6 Consumer Research Uplift Models The Uplift Model Report 141 To the right of the Treatment column information is a report showing summary values relating to prediction. (Keep in mind that prediction is not the objective in uplift modeling.) The report updates as splitting occurs. If a validation set is used, values are shown for both the training and the validation sets. RSquare The RSquare for the regression model associated with the tree. Note that the regression model includes interactions with the treatment column. N The number of observations. Number of Splits The number of times splitting has occurred. AICc The Corrected Akaike Information Criterion (AICc), computed using the associated regression model. AICc is only given for continuous responses. For more details, see Fitting Linear Models. Uplift Decision Tree The decision tree shows the splits used to model uplift. See Figure 6.4 for an example using the Hair Care Product.jmp sample data table. Each node contains the following information: Treatment The name of the treatment column is shown, with its two levels. Rate Only appears for two-level categorical responses. For each treatment level, the proportion of subjects in this node who responded. Mean Only appears for continuous responses. For each treatment level, the mean response for subjects in this node. Count The number of subjects in this node in the specified treatment level. t Ratio The t ratio for the test for a difference in response across the levels of Treatment for subjects in this node. If the response is categorical, it is treated as continuous (values 0 and 1) for this test. The difference in response means across the levels of Treatment. This is the uplift, assuming that: Trt Diff ‒ The first level in the treatment column’s value ordering represents the treatment. ‒ The response is defined so that larger values reflect greater impact. LogWorth The value of the logworth for the subsequent split based on the given node. 142 Uplift Models The Uplift Model Report Chapter 6 Consumer Research Figure 6.4 Nodes for First Split Candidates Report Each node also contains a Candidates report. This report gives: Term The model term. LogWorth The maximum logworth over all possible splits for the given term. The logworth corresponding to a split is -log10 of the adjusted p-value. F Ratio When the response is continuous, this is the F Ratio associated with the interaction term in a linear regression model. The regression model specifies the response as a linear function of the treatment, the binary split, and their interaction. When the response is categorical, this is the ChiSquare value for the interaction term in a nominal logistic model. Gamma When the response is continuous, this is the coefficient of the interaction term in the linear regression model used in computing the F ratio. When the response is categorical, this is an estimate of the interaction constructed from Firth-adjusted log-odds ratios. If the term is continuous, this is the point that defines the split. If the term is categorical, this describes the first (left) node. Cut Point Uplift Report Options With the exception of the options described below, all of the red triangle options for the Uplift report are described in the documentation for the Partition platform. For details about these options, see the Partition Models chapter in the Specialized Models book. Chapter 6 Consumer Research Uplift Models The Uplift Model Report 143 Minimum Size Split This option presents a window where you enter a number or a fractional portion of the total sample size to define the minimum size split allowed. To specify a number, enter a value greater than or equal to 1. To specify a fraction of the sample size, enter a value less than 1. The default value for the Uplift platform is set to 25 or the floor of the number of rows divided by 2,000, whichever value is greater. Column Uplift Contributions This table and plot address a column’s contribution to the uplift tree structure. A column’s contribution is computed as the sum of the F Ratio values associated with its splits. Recall that these values measure the significance of the treatment-by-split interaction term in the linear regression model. Uplift Graph Consider the observations in the training set. Define uplift for an observation as the difference between the predicted probabilities or means across the levels of Treatment for the observation’s terminal node. These uplift values are sorted in descending order. On its vertical axis, the Uplift Graph shows the uplift values. On its horizontal axis, the graph shows the proportion of observations with each uplift value. See Figure 6.5 for an example of an Uplift Graph for the Hair Care Product.jmp sample data table after three splits. Note that, for two groups of subjects (males and non-blond women in the Age ≥ 42 group), the promotion has a negative effect. The horizontal lines shown on the Uplift Graph delineate the graph for the validation set. Specifically, the decision tree is evaluated for the validation set and the Uplift Graph is constructed from the estimated uplifts. Figure 6.5 Uplift Graph 144 Uplift Models The Uplift Model Report Chapter 6 Consumer Research Save Columns Save Difference Saves the estimated difference in mean responses across levels of Treatment for the observation’s node. This is the estimated uplift. Save Difference Formula Saves the formula for the Difference, or uplift. Chapter 7 Item Analysis Analyze Test Results by Item and Subject Item Response Theory (IRT) is a method of scoring tests. Although classical test theory methods have been widely used for a century, IRT provides a better and more scientifically based scoring procedure. Its advantages include: • Scoring tests at the item level, giving insight into the contributions of each item on the total test score. • Producing scores of both the test takers and the test items on the same scale. • Fitting nonlinear logistic curves, more representative of actual test performance than classical linear statistics. Figure 7.1 Item Analysis Example Contents Item Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Launch the Item Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 The Item Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Characteristic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 Information Curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Dual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Item Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Chapter 7 Consumer Research Item Analysis Item Analysis Platform Overview 147 Item Analysis Platform Overview Psychological measurement is the process of assigning quantitative values as representations of characteristics of individuals or objects, so-called psychological constructs. Measurement theories consist of the rules by which those quantitative values are assigned. Item Response Theory (IRT) is a measurement theory. IRT uses a mathematical function to relate an individual’s probability of correctly responding to an item to a trait of that individual. Frequently, this trait is not directly measurable and is therefore called a latent trait. To see how IRT relates traits to probabilities, first examine a test question that follows the Guttman “perfect scale” as shown in Figure 7.2. The horizontal axis represents the amount of the theoretical trait that the examinee has. The vertical axis represents the probability that the examinee will get the item correct. (A missing value for a test question is treated as an incorrect response.) The curve in Figure 7.2 is called an item characteristic curve (ICC). P(Correct Response) Figure 7.2 Item Characteristic Curve of a Perfect Scale Item 1.0 0.5 b Trait Level This figure shows that a person who has ability less than the value b has a 0% chance of getting the item correct. A person with trait level higher than b has a 100% chance of getting the item correct. Of course, this is an unrealistic item, but it is illustrative in showing how a trait and a question probability relate to each other. More typical is a curve that allows probabilities that vary from zero to one. A typical curve found empirically is the S-shaped logistic function with a lower asymptote at zero and upper asymptote at one. It is markedly nonlinear. An example curve is shown in Figure 7.3. 148 Item Analysis Item Analysis Platform Overview Chapter 7 Consumer Research Figure 7.3 Example Item Response Curve The logistic model is the best choice to model this curve, because it has desirable asymptotic properties, yet is easier to deal with computationally than other proposed models (such as the cumulative normal density function). The model itself is 1–c P ( θ ) = c + ------------------------------------–( a ) ( θ – b ) 1+e In this model, referred to as a Three-Parameter Logistic (3PL) model, the variable a represents the steepness of the curve at its inflection point. Curves with varying values of a are shown in Figure 7.4. This parameter can be interpreted as a measure of the discrimination of an item— that is, how much more difficult the item is for people with high levels of the trait than for those with low levels of the trait. Very large values of a make the model practically the step function shown in Figure 7.2. It is generally assumed that an examinee will have a higher probability of getting an item correct as their level of the trait increases. Therefore, a is assumed to be positive and the ICC is monotonically increasing. Some use this positive-increasing property of the curve as a test of the appropriateness of the item. Items whose curves do not have this shape should be considered as candidates to be dropped from the test. Figure 7.4 Logistic Model for Several Values of a Chapter 7 Consumer Research Item Analysis Item Analysis Platform Overview 149 Changing the value of b merely shifts the curve from left to right, as shown in Figure 7.5. It corresponds to the value of θ at the point where P(θ)=0.5. The parameter b can therefore be interpreted as item difficulty where (graphically), the more difficult items have their inflection points farther to the right along their x-coordinate. Figure 7.5 Logistic Curve for Several Values of b Notice that limP ( θ ) = c θ → –∞ and therefore c represents the lower asymptote, which can be nonzero. ICCs for several values of c are shown graphically in Figure 7.6. The c parameter is theoretically pleasing, because a person with no ability of the trait might have a nonzero chance of getting an item right. Therefore, c is sometimes called the pseudo-guessing parameter. Figure 7.6 Logistic Model for Several Values of c By varying these three parameters, a wide variety of probability curves are available for modeling. A sample of three different ICCs is shown in Figure 7.7. Note that the lower asymptote varies, but the upper asymptote does not. This is because of the assumption that there might be a lower guessing parameter, but as the trait level increases, there is always a theoretical chance of 100% probability of correctly answering the item. 150 Item Analysis Item Analysis Platform Overview Chapter 7 Consumer Research Figure 7.7 Three Item Characteristic Curves Note, however, that the 3PL model might by unnecessarily complex for many situations. If, for example, the c parameter is restricted to be zero (in practice, a reasonable restriction), there are fewer parameters to predict. This model, where only a and b parameters are estimated, is called the 2PL model. Another advantage of the 2PL model (aside from its greater stability than the 3PL) is that b can be interpreted as the point where an examinee has a 50% chance of getting an item correct. This interpretation is not true for 3PL models. A further restriction can be imposed on the general model when a researcher can assume that test items have equal discriminating power. In these cases, the parameter a is set equal to 1, leaving a single parameter to be estimated, the b parameter. This 1PL model is frequently called the Rasch model, named after Danish mathematician Georg Rasch, the developer of the model. The Rasch model is quite elegant, and is the least expensive to use computationally. Caution: You must have a lot of data to produce stable parameter estimates using a 3PL model. 2PL models are frequently sufficient for tests that intuitively deserve a guessing parameter. Therefore, the 2PL model is the default and recommended model. Launch the Item Analysis Platform For example, open the sample data file MathScienceTest.jmp. These data are a subset of the data from the Third International Mathematics and Science Study (TIMMS) conducted in 1996. To launch the Item Analysis platform, select Analyze > Consumer Research > Item Analysis. This shows the dialog in Figure 7.8. Chapter 7 Consumer Research Item Analysis Item Analysis Platform Overview 151 Figure 7.8 Item Analysis Launch Window Y, Test Items Are the questions from the test instrument. Freq By Specifies a variable used to specify the number of times each response pattern appears. Performs a separate analysis for each level of the specified variable. Specify the desired model (1PL, 2PL, or 3PL) by selecting it from the Model drop-down menu. For this example, specify all fourteen continuous questions (Q1, Q2,..., Q14) as Y, Test Items and click OK. This accepts the default 2PL model. Special Note on 3PL Models If you select the 3PL model, a dialog pops up asking for a penalty for the c parameters (thresholds). This is not asking for the threshold itself. The penalty that it requests is similar to the type of penalty parameter that you would see in ridge regression, or in neural networks. The penalty is on the sample variance of the estimated thresholds, so that large values of the penalty force the estimated thresholds’ values to be closer together. This has the effect of speeding up the computations, and reducing the variability of the threshold (at the expense of some bias). In cases where the items are questions on a multiple choice test where there are the same number of possible responses for each question, there is often reason to believe (a priori) that the threshold parameters would be similar across items. For example, if you are analyzing the results of a 20-question multiple choice test where each question had four possible responses, it is reasonable to believe that the guessing, or threshold, parameters would all be near 0.25. So, in some cases, applying a penalty like this has some “physical intuition” to support it, in addition to its computational advantages. 152 Item Analysis The Item Analysis Report Chapter 7 Consumer Research The Item Analysis Report The following plots appear in Item Analysis reports. Characteristic Curves Item characteristic curves for each question appear in the top section of the output. Initially, all curves are shown stacked in a single column. They can be rearranged using the Number of Plots Across command, found in the drop down menu of the report title bar. For Figure 7.9, four plots across are displayed. Figure 7.9 Component Curves A vertical red line is drawn at the inflection point of each curve. In addition, dots are drawn at the actual proportion correct for each ability level, providing a graphical method of judging goodness-of-fit. Gray information curves show the amount of information each question contributes to the overall information of the test. The information curve is the slope of the ICC curve, which is maximized at the inflection point. Chapter 7 Consumer Research Item Analysis The Item Analysis Report 153 Figure 7.10 Elements of the ICC Display ICC information curve ICC Inflection Point Information Curves Questions provide varying levels of information for different ability levels. The gray information curves for each item show the amount of information that each question contributes to the total information of the test. The total information of the test for the entire range of abilities is shown in the Information Plot section of the report (Figure 7.11). Figure 7.11 Information Plot Dual Plots The information gained from item difficulty parameters in IRT models can be used to construct an increasing scale of questions, from easiest to hardest, on the same scale as the examinees. This structure gives information about which items are associated with low levels of the trait, and which are associated with high levels of the trait. JMP shows this correspondence with a dual plot. The dual plot for this example is shown in Figure 7.12. 154 Item Analysis The Item Analysis Report Chapter 7 Consumer Research Figure 7.12 Dual Plot Questions are plotted to the left of the vertical dotted line, examinees on the right. In addition, a histogram of ability levels is appended to the right side of the plot. This example shows a wide range of abilities. Q10 is rated as difficult, with an examinee needing to be around half a standard deviation above the mean in order to have a 50% chance of correctly answering the question. Other questions are distributed at lower ability levels, with Q11 and Q4 appearing as easier. There are some questions that are off the displayed scale (Q7 and Q14). The estimated parameter estimates appear below the Dual Plot, as shown in Figure 7.13. Chapter 7 Consumer Research Item Analysis Item Analysis Platform Options 155 Figure 7.13 Parameter Estimates Item Identifies the test item. Is the b parameter from the model. A histogram of the difficulty parameters is shown beside the difficulty estimates. Difficulty Discrimination Is the a parameter from the model, shown only for 2PL and 3PL models. A histogram of the discrimination parameters is shown beside the discrimination estimates. Threshold Is the c parameter from the model, shown only for 3PL models. Item Analysis Platform Options The following three commands are available from the drop-down menu on the title bar of the report. Number of Plots Across Brings up a dialog to specify how many plots should be grouped together on a single line. Initially, plots are stacked one-across. Figure 7.9 on page 152 shows four plots across. Save Ability Formula Creates a new column in the data table containing a formula for calculating ability levels. Because the ability levels are stored as a formula, you can add rows to the data table and have them scored using the stored ability estimates. In addition, you can run several models and store several estimates of ability in the same data table. The ability is computed using the IRT Ability function. The function has the following form IRT Ability (Q1, Q2,...,Qn, [a1, a2,..., an, b1, b2,..., bn, c1, c2, ..., cn]); where Q1, Q2,...,Qn are columns from the data table containing items, a1, a2,..., an are the corresponding discrimination parameters, b1, b2,..., bn are the corresponding difficulty parameters for the items, and c1, c2, ..., cn are the corresponding threshold parameters. Note that the parameters are entered as a matrix, enclosed in square brackets. 156 Item Analysis Technical Details Script Chapter 7 Consumer Research Contains options that are available to all platforms. See Using JMP. Technical Details Note that P(θ) does not necessarily represent the probability of a positive response from a particular individual. It is certainly feasible that an examinee might definitely select an incorrect answer, or that an examinee might know an answer for sure, based on the prior experiences and knowledge of the examinee, apart from the trait level. It is more correct to think of P(θ) as the probability of response for a set of individuals with ability level θ. Said another way, if a large group of individuals with equal trait levels answered the item, P(θ) predicts the proportion that would answer the item correctly. This implies that IRT models are item-invariant; theoretically, they would have the same parameters regardless of the group tested. An assumption of these IRT models is that the underlying trait is unidimensional. That is to say, there is a single underlying trait that the questions measure that can be theoretically measured on a continuum. This continuum is the horizontal axis in the plots of the curves. If there are several traits being measured, each of which have complex interactions with each other, then these unidimensional models are not appropriate. Chapter 8 Multiple Correspondence Analysis Multiple Correspondence Analysis (MCA) takes multiple categorical variables and seeks to identify associations between levels of those variables. MCA extends correspondence analysis from two variables to many. It can be thought of as analogous to principal component analysis for quantitative variables. Similar to other multivariate methods, it is a dimension reducing method; it represents the data as points in 2- or 3-dimensional space. Multiple correspondence analysis is frequently used in the social sciences particularly in France and Japan. It can be used in survey analysis to identify question agreement. It is also used in consumer research to identify potential markets for products. Microarray studies in genetics also use MCA to identify potential relationships between genes. Figure 8.1 Multiple Correspondence Analysis Contents Example of Multiple Correspondence Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Launch the Multiple Correspondence Analysis Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 The Multiple Correspondence Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Multiple Correspondence Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Correspondence Analysis Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Cross Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Cross Table of Supplementary Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Cross Table of Supplementary Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Additional Examples of the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . 167 Example Using a Supplementary Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Example Using a Supplementary ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Statistical Details for the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . . . . 171 Chapter 8 Consumer Research Multiple Correspondence Analysis Example of Multiple Correspondence Analysis 159 Example of Multiple Correspondence Analysis This example uses the Car Poll.jmp sample data table, which contains data collected from car polls. The data include aspects about the individuals polled, such as sex, marital status, and age. The data also include aspects about the car that they own, such as the country of origin, the size, and the type. You want to explore relationships between sex, marital status, country and size of car to identify consumer preferences. 1. Select Help > Sample Data Library and open Car Poll.jmp. 2. Select Analyze > Consumer Research > Multiple Correspondence Analysis. 3. Select sex, marital status, country, and size and click Y, Response. In MCA, usually all columns are considered responses rather than some being responses and others explanatory. 4. Click OK. Figure 8.2 Completed Multiple Correspondence Analysis Launch Window The Multiple Correspondence Analysis report is shown in Figure 8.3. Note that some of the outlines are closed because of space considerations. The Variable Summary report provides a concise view of the analysis completed. The Correspondence Analysis report shows the cloud of categories of the four variables as projected onto the two principal axes. From this cloud, you can see that Americans have a strong association with the large car size while Japanese are highly associated with the small car size. Also, males are strongly associated with the small car type and females are associated with the medium car size. This information could be used in market research to identify target audiences for advertisements. 160 Multiple Correspondence Analysis Example of Multiple Correspondence Analysis Figure 8.3 Multiple Correspondence Analysis Report Chapter 8 Consumer Research Chapter 8 Consumer Research Multiple Correspondence Analysis Example of Multiple Correspondence Analysis 161 Launch the Multiple Correspondence Analysis Platform Launch the Multiple Correspondence Analysis platform by selecting Analyze > Consumer Research > Multiple Correspondence Analysis. This example uses the Car Poll.jmp sample data table. Figure 8.4 Multiple Correspondence Analysis Launch Window Assigns the categorical columns to be analyzed. Usually, in MCA, you are interested in the associations between variables, but there are not explicit “explanatory” and “response” variables. Y, Response X, Factor Assigns the categorical columns to be used as factor, or explanatory, variables. Z, Supplementary Variable Assigns the columns to be used as supplementary variables. These variables are those you are interested in identifying associations with but not include in the calculations. Supplementary ID Assigns the column that identifies rows to be used as supplementary. A supplementary ID column usually has 1s and 0s. The rows associated with ID 0 are treated as supplementary rows. The Supplementary ID option functions if only one Y and only one X column have been specified. Assigns a frequency variable to this role. This is useful if your data are summarized. In this instance, you have one column for the Y values and another column for the frequency of occurrence of the Y values. Freq By Produces a separate report for each level of the By variable. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables. 162 Multiple Correspondence Analysis The Multiple Correspondence Analysis Report Chapter 8 Consumer Research The Multiple Correspondence Analysis Report The initial Multiple Correspondence Analysis report shows the variable summary, correspondence analysis plot, and details of the dimensions of the data in order of importance. From the plot of the cloud of categories or individuals, you can identify associations that exist within the data. The details provide information about whether the two dimensions shown in the plot are sufficient to understand the relationships within the table. The Variable Summary shows the columns used in the analysis and the roles that you selected in the launch window. If you select the Show Controls check box, a list of the columns in the data table appears to the left. You can change the columns in the analysis either by selecting a column and clicking Add Y, Add X, or Add Z. Or you can drag the column to the header in the variable summary table. This enables you to modify the analysis without returning to the launch window. Figure 8.5 Multiple Correspondence Analysis Report with Show Controls Selected Chapter 8 Consumer Research Multiple Correspondence Analysis Multiple Correspondence Analysis Platform Options 163 Multiple Correspondence Analysis Platform Options The Multiple Correspondence Analysis red triangle menu options give you the ability to customize reports according to your needs. The reports available are determined by the type of analysis that you conduct. Provides correspondence analysis reports. These reports give the plots, details, estimates of coordinates, and summary statistics. See “Additional Examples of the Multiple Correspondence Analysis Platform” on page 167. Correspondence Analysis Cross Table Provides the Burt table or contingency table as appropriate for variable roles selected. See “Cross Table” on page 166. Provides a contingency table of the supplementary variable(s) versus the response variable(s). This table appears by default only if a supplementary variable has been specified in the launch window. Cross Table of Supplementary Rows Cross Table of Supplementary Columns Provides a contingency table of the X, Factor variable(s) versus the supplementary variable(s). This table appears by default only if a factor variable and a supplementary variable have been specified in the launch window. Mosaic Plot Displays a mosaic bar chart for each nominal or ordinal response variable. A mosaic plot is a stacked bar chart where each segment is proportional to its group’s frequency count. This option is available if only one Y and only one X variable are selected. Tests for Independence Provides the tests for independence whether there is association between the row and column variables. There are two versions of this test, the Pearson form and the Likelihood Ratio form, both with chi-square statistics. This option is available only when there is one Y variable and one X variable. Script Contains options that are available to all platforms. See the Using JMP book. Correspondence Analysis Options The reports available under Correspondence Analysis are determined by the type of analysis that you conduct. Several of these reports are shown by default. Shows the two-dimensional cloud of categories in the plane described by the first two principal axes. This plot appears by default. Show Plot Provides the details of the analysis including the singular values, inertias, ChiSquare statistics, percent, and cumulative percent. This report appears by default. See “Show Detail” on page 164. Show Detail Show Adjusted Inertia Provides reports of the Benzecri and Greenacre adjusted inertia. See Benzecri (1979) and Greenacre (1984). This option is available only for MCA, that is all columns are Ys. See “Show Adjusted Inertia” on page 164. 164 Multiple Correspondence Analysis Multiple Correspondence Analysis Platform Options Chapter 8 Consumer Research Provides a report of up to the first three principal coordinates for the categories in the analysis, as appropriate. See “Show Coordinates” on page 165. Show Coordinates Show Summary Statistics Provides a report of the summary statistics, Quality, Mass, and Inertia, for each category in the analysis. See “Show Summary Statistics” on page 165. Show Partial Contributions to Inertia Provides a report of the contribution of each category to the inertia for each of up to the first three dimensions. See “Show Partial Contributions to Inertia” on page 166. Show Squared Cosines Provides a report of the squared cosines of each category for each of up to the first three dimensions. See “Show Squared Cosines” on page 166. Shows the three-dimensional cloud of categories of the Y, X, and Z variables in the space described by the first three principal axes. This option is not available if there are less than three dimensions. 3D Correspondence Analysis Show Plot The plot displays a projection of the cloud of categories or individuals onto the plane described by the first two principal axes. The distance scale is the same in all directions. You can toggle the dimensions shown in the plot using the Select Dimension controls below the plot. The first control defines the horizontal axis of the plot, and the second control defines the vertical axis of the plot. Click the arrow button to cycle through the dimensions shown in the plot. Show Detail Shows the singular value decomposition of the contingency table or Burt table. For the formula, see “Statistical Details for the Details Report” on page 171. Singular Value Inertia Lists the square of the singular values, reflecting the relative variation accounted for in the canonical dimensions. ChiSquare Lists the portion of the overall Chi-square for the Burt or contingency table represented by the dimension. Percent Portion of inertia with respect to the total inertia. Cumulative Percent Shows the cumulative portion of inertia. If the first two singular values capture the bulk of the inertia, then the 2-D correspondence analysis plot is sufficient to show the relationships in the table. Show Adjusted Inertia The principal inertias of a Burt table in MCA are the eigenvalues. The problem with these inertias is that they provide a pessimistic indication of fit. Benzécri proposed an inertia Chapter 8 Consumer Research Multiple Correspondence Analysis Multiple Correspondence Analysis Platform Options 165 adjustment. Greenacre argued that the Benzécri adjustment overestimates the quality of fit and proposed an alternate adjustment. Both adjustments are calculated for your reference. See “Statistical Details for Adjusted Inertia” on page 171. Inertia Lists the square of the singular values, reflecting the relative variation accounted for in the canonical dimensions. Adjusted Inertia Lists the adjusted inertia according to either the Benzécri or Greenacre adjustment. Portion of adjusted inertia with respect to the total inertia. Percent Cumulative Percent Shows the cumulative portion of adjusted inertia. If the first two singular values capture the bulk of the inertia, then the 2-D correspondence analysis plot is sufficient to show the relationships in the table. Show Coordinates Y Lists the columns specified as Y, Response variables. Category Lists the levels of the Y variables. Lists the coordinate for the category along the respective principal axis. By default, the table shows the first three dimension columns. Additional dimension columns are hidden. To reveal these optional columns, right-click on the table and select the dimension columns from the Columns submenu. Dimension 1, Dimension 2, Dimension 3 If there are columns specified as X, Factor variables, the Coordinates report displays tables of both X and Y with the same report headings. If a Z, Supplementary Variable is specified, the coordinates are listed below the X and Y coordinates as applicable. Show Summary Statistics Y Lists the columns specified as Y, Response variables. Category Lists the levels of the Y variables. Quality Lists the quality of the representation of the level by the solution. Mass Lists the row percentage from the Burt or contingency table. Inertia Lists the row marginal percentage of the total inertia accounted for by the respective point. If there are columns specified as X, Factor variables, the Summary Statistics report displays tables of both X and Y with the same report headings. See “Statistical Details for Summary Statistics” on page 172. 166 Multiple Correspondence Analysis Multiple Correspondence Analysis Platform Options Chapter 8 Consumer Research Show Partial Contributions to Inertia Y Lists the columns specified as Y, Response variables. Category Lists the levels of the Y variables. Lists the contribution of the level to the inertia of the respective dimension. By default, the table shows the first three dimension columns. Additional dimension columns are hidden. To reveal these optional columns, right-click on the table and select the dimension columns from the Columns submenu. Dimension 1, Dimension 2, Dimension 3 Each category contributes to the inertia of each dimension. The partial contributions within each dimension sum to 1. If there are columns specified as X, Factor variables, the Partial Contributions to Inertia report displays tables of both X and Y with the same report headings. See “Statistical Details for Partial Contributions to Inertia” on page 172. Show Squared Cosines Y Lists the columns specified as Y, Response variables. Category Lists the levels of the Y variables. Lists the quality of the representation of the level by the respective dimension. By default, the table shows the first three dimension columns. Additional dimension columns are hidden. To reveal these optional columns, right-click on the table and select the dimension columns from the Columns submenu. Dimension 1, Dimension 2, Dimension 3 The values indicate the quality of each point for the dimension listed. The squared cosine can be interpreted as the correlation of the point with the dimension. The sum of the squared cosines of the first two dimensions is equal to the quality indicated in the summary statistics report. The term refers to the fact that the value is also the squared cosine value of the angle the point makes with the dimension. If there are columns specified as X, Factor variables, the Squared Cosines report displays tables of both X and Y with the same report headings. Cross Table The Burt table is the basis of the multiple correspondence analysis. It is a partitioned symmetric table of all pairs of categorical variables. The diagonal partitions are diagonal matrices (a cross-table of a variable with itself). The off-diagonal partitions are ordinary contingency tables. When you select multiple Y, Response columns with no X, Factor columns, the Burt table is created. If you select any X, Factor columns, a traditional contingency table is created instead of a Burt table. The red triangle menu for the Burt or contingency table contains options of statistics to display in the table. Chapter 8 Consumer Research Multiple Correspondence Analysis Additional Examples of the Multiple Correspondence Analysis Platform 167 Cell frequency, margin total frequencies, and grand total (total sample size). This appears by default. Count Total % Percent of cell counts and margin totals to the grand total. This appears by default. Cell Chi Square Chi-square values computed for each cell as (O - E)2 / E. Col % Percent of each cell count to its column total. Row % Percent of each cell count to its row total. Expected Expected frequency (E) of each cell under the assumption of independence. Computed as the product of the corresponding row total and column total divided by the grand total. Deviation Observed cell frequency (O) minus the expected cell frequency (E). Col Cum Cumulative column total. Col Cum % Cumulative column percentage. Row Cum Cumulative row total. Row Cum % Cumulative row percentage. Make Into Data Table Creates one data table for each statistic shown in the table. Cross Table of Supplementary Rows When a Z, Supplementary column is selected, a contingency table with the supplementary column levels as the rows and the response column levels as the columns is created. The red triangle menu contains the same options as the Burt Table. Cross Table of Supplementary Columns When an X, Factor column and a Z, Supplementary column are selected, a contingency table with the X, Factor levels as rows and the Supplementary levels as columns is created. The red triangle menu contains the same options as the Burt Table. Additional Examples of the Multiple Correspondence Analysis Platform Example Using a Supplementary Variable This example uses the Car Poll.jmp sample data table, which contains data collected from car polls. The data include aspects about the individuals polled, such as sex, marital status, and age. The data also include aspects about the car that they own, such as the country of origin, 168 Multiple Correspondence Analysis Additional Examples of the Multiple Correspondence Analysis Platform Chapter 8 Consumer Research the size, and the type. You want to explore relationships between sex, country, and size of car to identify consumer preferences. 1. Select Help > Sample Data Library and open Car Poll.jmp. 2. Select Analyze > Consumer Research > Multiple Correspondence Analysis. 3. Select country and size and click Y, Response. 4. Select marital status and click Z, Supplementary Variable. 5. Click OK. Unlike in the first example, this analysis does not use marital status in the calculations. Marital status is plotted after the calculations are complete. You see from the plot strong relationships between Japanese and Small cars as well as American and Large cars. The two marital statuses are plotted in a different color. Single people seem to prefer smaller cars a bit more than married people. Figure 8.6 MCA with Supplementary Variable Report Chapter 8 Consumer Research Multiple Correspondence Analysis Additional Examples of the Multiple Correspondence Analysis Platform 169 Example Using a Supplementary ID The United States census allows for examining population growth over the last century. The US Regional Population.jmp sample data table contains populations of the 50 US states grouped into regions for each of the census years from 1920 to 2010. Alaska and Hawaii are treated as supplementary regions because they were not states during the entire time, and they are not part of the contiguous United States. You are interested in whether the population growth in these two states differs from the rest of the US. 1. Select Help > Sample Data Library and open US Regional Population.jmp. 2. Select Analyze > Consumer Research > Multiple Correspondence Analysis. 3. Select Year and click Y, Response. 4. Select Region and click X, Factor. 5. Select ID and click Supplementary ID. 6. Select Population and click Freq. 7. Click OK. The Details report shows that the association between years and regions is almost entirely explained by the first dimension. The plot shows that years are in the correct order on the first dimension. This ordering occurs naturally through the correspondence analysis; there is no information about the order provided to the analysis. Notice that the ordering of the regions reflects the population shift from the Midwest to the Northeast to the South and finally to the Mountain and West. Alaska and Hawaii were not used in the computation of the analysis but are plotted based on the results. Their growth pattern is most similar to the Pacific states. Alaska’s growth is even more extreme than the Pacific region. 170 Multiple Correspondence Analysis Additional Examples of the Multiple Correspondence Analysis Platform Figure 8.7 MCA with Supplementary ID Report Chapter 8 Consumer Research Chapter 8 Consumer Research Multiple Correspondence Analysis Statistical Details for the Multiple Correspondence Analysis Platform 171 Statistical Details for the Multiple Correspondence Analysis Platform Statistical Details for the Details Report When a simple Correspondence Analysis is performed, the report lists the singular values of the following equation: – 0.5 Dr – 0.5 ( P – rc' ) D c = UD u V ′ where: • P is the matrix of counts divided by the total frequency • r and c are row and column sums of P • the Ds are diagonal matrices of the values of r and c When Multiple Correspondence Analysis is performed, the singular value decomposition extends to: D – 0.5 – 0.5 C = UD u V ′ ----------- – D 11' D D 2 Q n where: 1 D = ---- diag ( D 1, D 2, …, D Q ) m • C is the Burt table. • Q is the number of categorical variables • n is the number of observations • 1 is a column vector of ones Statistical Details for Adjusted Inertia The usual principal inertias of a Burt table constructed from m categorical variables in MCA 2 are the eigenvalues uk from D u . These inertias provide a pessimistic indication of fit. Benzécri (1979) proposed the following inertia adjustment; it is also described by Greenacre (1984, p. 145): m 2 1 2 1 ------------ × u k – ---- for u k > --- m – 1 m m 172 Multiple Correspondence Analysis Statistical Details for the Multiple Correspondence Analysis Platform Chapter 8 Consumer Research This adjustment computes the percent of adjusted inertia relative to the sum of the adjusted inertias for all inertias greater than 1 ⁄ m . Greenacre (1994, p. 156) argues that the Benzécri adjustment overestimates the quality of fit. Greenacre proposes instead to compute the percentage of adjusted inertia relative to: n c – m 4 m ------------- trace ( D u ) – --------------- 2 m – 1 m 4 for all inertias greater than 1 ⁄ m , where trace ( D u ) is the sum of squared inertias and nc is the total number of categories across the m variables. Statistical Details for Summary Statistics Quality is the ratio of the sum of the squared cosines in the chosen number of dimensions and the sum of the squared cosines in the maximum number of dimensions. Quality indicates how well the point is represented in the reduced dimension space. Inertia is the total Pearson Chi-square for a two-way frequency table divided by the sum of all observations in the table. In the summary statistics table, the relative inertia is listed. Relative inertia is the proportion of the contribution of the point to the overall inertia. Statistical Details for Partial Contributions to Inertia The contribution of a row or column to the inertia of a dimension is calculated as: 2 mass × coordinate contribution = ------------------------------------------------------dimensioninertia Appendix A References Akaike, H. (1974), “Factor Analysis and AIC,” Pschychometrika, 52, 317–332. Akaike, H. (1987), “A new Look at the Statistical Identification Model,” IEEE Transactions on Automatic Control, 19, 716–723. Bartlett, M.S. (1954), “A Note on the Multiplying Factors for Various Chi Square Approximations,” Journal of the Royal Statistical Society, 16 (Series B), 296-298. Benzécri, J. P. (1979), “Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire, addendum et erratum à [BIN. MULT.],” Cahiers de l’Analyse des Données, 4, 377–378. Dwass, M. (1955), “A Note on Simultaneous Confidence Intervals,” Annals of Mathematical Statistics 26: 146–147. Farebrother, R.W. (1981), “Mechanical Representations of the L1 and L2 Estimation Problems,” Statistical Data Analysis, 2nd Edition, Amsterdam, North Holland: edited by Y. Dodge. Fieller, E.C. (1954), “Some Problems in Interval Estimation,” Journal of the Royal Statistical Society, Series B, 16, 175-185. Firth, D. (1993), “Bias Reduction of Maximum Likelihood Estimates,” Biometrika 80:1, 27–38. Goodnight, J.H. (1978), “Tests of Hypotheses in Fixed Effects Linear Models,” SAS Technical Report R–101, Cary: SAS Institute Inc, also in Communications in Statistics (1980), A9 167–180. Goodnight, J.H. and W.R. Harvey (1978), “Least Square Means in the Fixed Effect General Linear Model,” SAS Technical Report R–103, Cary NC: SAS Institute Inc. Greenacre, M. J. (1984), Theory and Applications of Correspondence Analysis, London: Academic Press. Heinze, G. and Schemper, M. (2002), “A Solution to the Problem of Separation in Logistic Regression,” Statistics in Medicine 21:16, 2409–2419. Hocking, R.R. (1985), The Analysis of Linear Models, Monterey: Brooks–Cole. Hosmer, D.W. and Lemeshow, S. (2000), Applied Logistic Regression, Second Edition, New York: John Wiley and Sons. Kaiser, H.F. (1958), “The varimax criterion for analytic rotation in factor analysis” Psychometrika, 23, 187–200. McFadden, D. (1974), “Conditional Logit Analysis of Qualitative Choice Behavior,” in P. Zarembka, ed., Frontiers in Econometrics, pp. 105–142. Radcliffe, N. J., and Surry, P. D. (2011), “Real-World Uplift Modelling with Significance-Based Uplift Trees,” Stochastic Solutions White Paper, Portrait Technical Report TR-2011-1. 174 References Appendix A Consumer Research Reichheld, F. F. (2003) “The One Number You Need to Grow,” Harvard Business Review, Vol. 81 No. 12, 46-54. Wright, S.P. and R.G. O’Brien (1988), “Power Analysis in an Enhanced GLM Procedure: What it Might Look Like,” SUGI 1988, Proceedings of the Thirteenth Annual Conference, 1097–1102, Cary NC: SAS Institute Inc. Index Consumer Research A exploratory factor analysis 67 Agreement Statistic 42 Aligned Responses 34 F B By variable 72, 161 C Categorical Platform 27 Free Text 32 Launch Window 30 Overview 29 Report 37 Structured 32 Supercategories 37 Choice Modeling 85–129 Choice Platform Launch Platform 98 Segmentation 115 Single Table Analysis 112 Common Factor Analysis 74 common factor analysis 67 Compare Each Sample 52–53 Comparisons with Letters 53 Confidence Intervals 99 Correlation of Estimates 99 Crosstab Format 40 Crosstab Transposed 41 D Daganzo Trip.jmp 122 Factor Analysis platform By variable 72 Freq variable 72 Weight variable 72 Firth Bias-adjusted Estimates 131 Free 35 Free Text 32 Freq variable 72 I Indicator Group 34 IRT 145 Item Analysis Platform 150 options 155 overview 147 report 152 Item Response Theory 145 J JMP Starter 23 Joint Factor Tests 99, 107 L Laptop Profile.jmp 102 Laptop Runs.jmp 103 Laptop Subjects.jmp 105 Likelihood Ratio Tests 98–99 Lung Cancer Responses.jmp 129 Lung Cancer.jmp 131 E Effect Marginals 99, 107 Eigenvalues 77 M Maximize for each Grid Point 93 176 menu tips 22 Model Dialog 102 Multiple Delimited 34 Multiple Response 34 Multiple Response by ID 34 N Number of factors 74 Number of Plots Across 152 O oblique transformation 75 orthogonal transformation 75 P Pizza Choice Example 88–120 Pizza Combined.jmp 113 Pizza Profiles.jmp 89, 115–116 Pizza Responses.jmp 89–90, 115–116 Pizza Subjects.jmp 115 Principal Components 74 Profile Data 94 R Rater Agreement 34 Repeated Measures 34 Response data 95 Response Frequencies 35 Responses 34 Rotation method 75 S Save Gradients by Subject 102, 115 Save Utility Formula 102 Scree Plot 77 Structured 35 Subject data 97 Supercategories 37 T Test Each Response 42 Test Response Homogeneity 42 Index Consumer Research tooltips 22 Transition Report 43 tutorials 21 U Uplift Model 135 overview 137 platform 139 report 140 report options 142 W-Z Weight variable 72, 161

Download PDF

advertisement