Consumer Research
Version 12
Consumer Research
“The real voyage of discovery consists not in seeking new
landscapes, but in having new eyes.”
Marcel Proust
JMP, A Business Unit of SAS
SAS Campus Drive
Cary, NC 27513
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015.
JMP® 12 Consumer Research. Cary, NC: SAS Institute Inc.
JMP® 12 Consumer Research
Copyright © 2015, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-62959-438-5 (Hardcopy)
ISBN 978-1-62959-440-8 (EPUB)
ISBN 978-1-62959-441-5 (MOBI)
ISBN 978-1-62959-439-2 (PDF)
All rights reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval
system, or transmitted, in any form or by any means, electronic, mechanical, photocopying,
or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms
established by the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means
without the permission of the publisher is illegal and punishable by law. Please purchase
only authorized electronic editions and do not participate in or encourage electronic piracy
of copyrighted materials. Your support of others’ rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is
commercial computer software developed at private expense and is provided with
RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of
the Software by the United States Government is subject to the license terms of this
Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a)
and DFAR 227.7202-4 and, to the extent required under U.S. federal law, the minimum
restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this
provision serves as notice under clause (c) thereof and no other notice is required to be
affixed to the Software or documentation. The Government’s rights in Software and
documentation shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414.
March 2015
SAS® and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA
registration.
Other brand and product names are trademarks of their respective companies.
Technology License Notices
•
Scintilla - Copyright © 1998-2014 by Neil Hodgson <[email protected]>.
All Rights Reserved.
Permission to use, copy, modify, and distribute this software and its documentation for
any purpose and without fee is hereby granted, provided that the above copyright
notice appear in all copies and that both that copyright notice and this permission
notice appear in supporting documentation.
NEIL HODGSON DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL NEIL
HODGSON BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
•
Telerik RadControls: Copyright © 2002-2012, Telerik. Usage of the included Telerik
RadControls outside of JMP is not permitted.
•
ZLIB Compression Library - Copyright © 1995-2005, Jean-Loup Gailly and Mark Adler.
•
Made with Natural Earth. Free vector and raster map data @ naturalearthdata.com.
•
Packages - Copyright © 2009-2010, Stéphane Sudre (s.sudre.free.fr). All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of
conditions and the following disclaimer in the documentation and/or other materials
provided with the distribution.
Neither the name of the WhiteBox nor the names of its contributors may be used to
endorse or promote products derived from this software without specific prior written
permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
•
iODBC software - Copyright © 1995-2006, OpenLink Software Inc and Ke Jin
(www.iodbc.org). All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:
‒ Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
‒ Redistributions in binary form must reproduce the above copyright notice, this list
of conditions and the following disclaimer in the documentation and/or other
materials provided with the distribution.
‒ Neither the name of OpenLink Software Inc. nor the names of its contributors may
be used to endorse or promote products derived from this software without specific
prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL OPENLINK OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.
•
bzip2, the associated library “libbzip2”, and all documentation, are Copyright ©
1996-2010, Julian R Seward. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:
Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
The origin of this software must not be misrepresented; you must not claim that you
wrote the original software. If you use this software in a product, an acknowledgment
in the product documentation would be appreciated but is not required.
Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
The name of the author may not be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE AUTHOR “AS IS” AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT
SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.
•
R software is Copyright © 1999-2012, R Foundation for Statistical Computing.
•
MATLAB software is Copyright © 1984-2012, The MathWorks, Inc. Protected by U.S.
and international patents. See www.mathworks.com/patents. MATLAB and Simulink
are registered trademarks of The MathWorks, Inc. See www.mathworks.com/
trademarks for a list of additional trademarks. Other product or brand names may be
trademarks or registered trademarks of their respective holders.
•
libopc is Copyright © 2011, Florian Reuter. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are
permitted provided that the following conditions are met:
‒ Redistributions of source code must retain the above copyright notice, this list of
conditions and the following disclaimer.
‒ Redistributions in binary form must reproduce the above copyright notice, this list
of conditions and the following disclaimer in the documentation and / or other
materials provided with the distribution.
‒ Neither the name of Florian Reuter nor the names of its contributors may be used to
endorse or promote products derived from this software without specific prior
written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS”
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED.IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.
•
libxml2 - Except where otherwise noted in the source code (e.g. the files hash.c, list.c
and the trio files, which are covered by a similar licence but with different Copyright
notices) all the files are:
Copyright © 1998 - 2003 Daniel Veillard. All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the “Software”), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons
to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or
substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.IN NO EVENT SHALL THE
DANIEL VEILLARD BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Except as contained in this notice, the name of Daniel Veillard shall not be used in
advertising or otherwise to promote the sale, use or other dealings in this Software
without prior written authorization from him.
Get the Most from JMP®
Whether you are a first-time or a long-time user, there is always something to learn
about JMP.
Visit JMP.com to find the following:
•
live and recorded webcasts about how to get started with JMP
•
video demos and webcasts of new features and advanced techniques
•
details on registering for JMP training
•
schedules for seminars being held in your area
•
success stories showing how others use JMP
•
a blog with tips, tricks, and stories from JMP staff
•
a forum to discuss JMP with other users
http://www.jmp.com/getstarted/
Contents
Consumer Research
1
Learn about JMP
Documentation and Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Formatting Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
JMP Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
JMP Documentation Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
JMP Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Additional Resources for Learning JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Sample Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Learn about Statistical and JSL Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Learn JMP Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Tooltips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
JMP User Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
JMPer Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
JMP Books by Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
The JMP Starter Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2
Introduction to Consumer Research
Overview of Customer and Behavioral Research Methods . . . . . . . . . . . . . . . . . . . . . . . . . 25
3
Categorical Response Analysis
Analyzing Survey and Other Counting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Categorical Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Example of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Launch the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Response Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Cast Selected Columns into Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Other Launch Window Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
The Categorical Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
10
Consumer Research
Share Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Frequency Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Categorical Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Statistical Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Free Text Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Structured Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Additional Examples of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Multiple Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Response Frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Indicator Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Multiple Delimited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Multiple Response by ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Mean Score Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4
Factor Analysis
Identify Factors within Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Factor Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Example of the Factor Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Launch the Factor Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
The Factor Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Model Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Rotation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Factor Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Factor Analysis Model Fit Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5
Choice Models
Fit Models for Choice Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Choice Modeling Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Example of the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Launch the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Choice Model Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Choice Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Valuing Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Example of Valuing Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Consumer Research
11
One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Example of One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Example of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Special Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Default Choice Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Subject Data with Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Example of Transforming Data to Two Analysis Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Example of Transforming Data to One Analysis Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Additional Examples of Logistic Regression Using the Choice Platform . . . . . . . . . . . . . . . . 129
Example of Logistic Regression Using the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . 129
Example of Logistic Regression for Matched Case-Control Studies . . . . . . . . . . . . . . . . . 132
Statistical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6
Uplift Models
Model the Incremental Impact of Actions on Consumer Behavior . . . . . . . . . . . . . . . . . 135
Uplift Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Example of the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Launch the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
The Uplift Model Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Uplift Model Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Uplift Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
7
Item Analysis
Analyze Test Results by Item and Subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Item Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Launch the Item Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
The Item Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Characteristic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Information Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Dual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Item Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
12
8
Consumer Research
Multiple Correspondence Analysis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Example of Multiple Correspondence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Launch the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
The Multiple Correspondence Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Multiple Correspondence Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Correspondence Analysis Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Show Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Show Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Show Adjusted Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Show Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Show Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
Show Partial Contributions to Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Show Squared Cosines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Cross Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Cross Table of Supplementary Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Cross Table of Supplementary Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Additional Examples of the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . 167
Example Using a Supplementary Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Example Using a Supplementary ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Statistical Details for the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . . . 171
Statistical Details for the Details Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Statistical Details for Adjusted Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Statistical Details for Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Statistical Details for Partial Contributions to Inertia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A
References
Index
Consumer Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Chapter 1
Learn about JMP
Documentation and Additional Resources
This chapter includes the following information:
•
book conventions
•
JMP documentation
•
JMP Help
•
additional resources, such as the following:
‒ other JMP documentation
‒ tutorials
‒ indexes
‒ Web resources
Figure 1.1 The JMP Help Home Window on Windows
Contents
Formatting Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
JMP Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
JMP Documentation Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
JMP Help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Additional Resources for Learning JMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Sample Data Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Learn about Statistical and JSL Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Learn JMP Tips and Tricks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Tooltips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
JMP User Community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
JMPer Cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
JMP Books by Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
The JMP Starter Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Chapter 1
Consumer Research
Learn about JMP
Formatting Conventions
15
Formatting Conventions
The following conventions help you relate written material to information that you see on
your screen.
•
Sample data table names, column names, pathnames, filenames, file extensions, and
folders appear in Helvetica font.
•
Code appears in Lucida Sans Typewriter font.
•
Code output appears in Lucida Sans Typewriter italic font and is indented farther than
the preceding code.
•
Helvetica bold formatting indicates items that you select to complete a task:
‒ buttons
‒ check boxes
‒ commands
‒ list names that are selectable
‒ menus
‒ options
‒ tab names
‒ text boxes
•
The following items appear in italics:
‒ words or phrases that are important or have definitions specific to JMP
‒ book titles
‒ variables
‒ script output
•
Features that are for JMP Pro only are noted with the JMP Pro icon
of JMP Pro features, visit http://www.jmp.com/software/pro/.
. For an overview
Note: Special information and limitations appear within a Note.
Tip: Helpful information appears within a Tip.
JMP Documentation
JMP offers documentation in various formats, from print books and Portable Document
Format (PDF) to electronic books (e-books).
16
Learn about JMP
JMP Documentation
Chapter 1
Consumer Research
•
Open the PDF versions from the Help > Books menu or from the JMP online Help footers.
•
All books are also combined into one PDF file, called JMP Documentation Library, for
convenient searching. Open the JMP Documentation Library PDF file from the Help > Books
menu.
•
e-books are available at online retailers. Visit http://www.jmp.com/support/downloads/
documentation.shtml for details.
•
You can also purchase printed documentation on the SAS website:
http://support.sas.com/documentation/onlinedoc/jmp/index.html
JMP Documentation Library
The following table describes the purpose and content of each book in the JMP library.
Document Title
Document Purpose
Document Content
Discovering JMP
If you are not familiar
with JMP, start here.
Introduces you to JMP and gets you
started creating and analyzing data.
Using JMP
Learn about JMP data
tables and how to
perform basic
operations.
Covers general JMP concepts and
features that span across all of JMP,
including importing data, modifying
columns properties, sorting data, and
connecting to SAS.
Basic Analysis
Perform basic analysis
using this document.
Describes these Analyze menu platforms:
•
Distribution
•
Fit Y by X
•
Matched Pairs
•
Tabulate
How to approximate sampling
distributions using bootstrapping and
modeling utilities are also included.
Chapter 1
Consumer Research
Learn about JMP
JMP Documentation
Document Title
Document Purpose
Document Content
Essential Graphing
Find the ideal graph
for your data.
Describes these Graph menu platforms:
•
Graph Builder
•
Overlay Plot
•
Scatterplot 3D
•
Contour Plot
•
Bubble Plot
•
Parallel Plot
•
Cell Plot
•
Treemap
•
Scatterplot Matrix
•
Ternary Plot
•
Chart
The book also covers how to create
background and custom maps.
Profilers
Learn how to use
interactive profiling
tools, which enable you
to view cross-sections
of any response
surface.
Covers all profilers listed in the Graph
menu. Analyzing noise factors is
included along with running simulations
using random inputs.
Design of
Experiments Guide
Learn how to design
experiments and
determine appropriate
sample sizes.
Covers all topics in the DOE menu and
the Screening menu item in the Analyze >
Modeling menu.
17
18
Learn about JMP
JMP Documentation
Chapter 1
Consumer Research
Document Title
Document Purpose
Document Content
Fitting Linear Models
Learn about Fit Model
platform and many of
its personalities.
Describes these personalities, all
available within the Analyze menu Fit
Model platform:
Specialized Models
Learn about additional
modeling techniques.
•
Standard Least Squares
•
Stepwise
•
Generalized Regression
•
Mixed Model
•
MANOVA
•
Loglinear Variance
•
Nominal Logistic
•
Ordinal Logistic
•
Generalized Linear Model
Describes these Analyze > Modeling
menu platforms:
•
Partition
•
Neural
•
Model Comparison
•
Nonlinear
•
Gaussian Process
•
Time Series
•
Response Screening
The Screening platform in the Analyze >
Modeling menu is described in Design of
Experiments Guide.
Multivariate
Methods
Read about techniques
for analyzing several
variables
simultaneously.
Describes these Analyze > Multivariate
Methods menu platforms:
•
Multivariate
•
Cluster
•
Principal Components
•
Discriminant
•
Partial Least Squares
Chapter 1
Consumer Research
Learn about JMP
JMP Documentation
Document Title
Document Purpose
Document Content
Quality and Process
Methods
Read about tools for
evaluating and
improving processes.
Describes these Analyze > Quality and
Process menu platforms:
Reliability and
Survival Methods
Consumer Research
Learn to evaluate and
improve reliability in a
product or system and
analyze survival data
for people and
products.
Learn about methods
for studying consumer
preferences and using
that insight to create
better products and
services.
•
Control Chart Builder and individual
control charts
•
Measurement Systems Analysis
•
Variability / Attribute Gauge Charts
•
Process Capability
•
Pareto Plot
•
Diagram
Describes these Analyze > Reliability and
Survival menu platforms:
•
Life Distribution
•
Fit Life by X
•
Recurrence Analysis
•
Degradation and Destructive
Degradation
•
Reliability Forecast
•
Reliability Growth
•
Reliability Block Diagram
•
Survival
•
Fit Parametric Survival
•
Fit Proportional Hazards
Describes these Analyze > Consumer
Research menu platforms:
•
Categorical
•
Multiple Correspondence Analysis
•
Factor Analysis
•
Choice
•
Uplift
•
Item Analysis
19
20
Learn about JMP
Additional Resources for Learning JMP
Chapter 1
Consumer Research
Document Title
Document Purpose
Document Content
Scripting Guide
Learn about taking
advantage of the
powerful JMP
Scripting Language
(JSL).
Covers a variety of topics, such as writing
and debugging scripts, manipulating
data tables, constructing display boxes,
and creating JMP applications.
JSL Syntax Reference
Read about many JSL
functions on functions
and their arguments,
and messages that you
send to objects and
display boxes.
Includes syntax, examples, and notes for
JSL commands.
Note: The Books menu also contains two reference cards that can be printed: The Menu Card
describes JMP menus, and the Quick Reference describes JMP keyboard shortcuts.
JMP Help
JMP Help is an abbreviated version of the documentation library that provides targeted
information. You can open JMP Help in several ways:
•
On Windows, press the F1 key to open the Help system window.
•
Get help on a specific part of a data table or report window. Select the Help tool
from
the Tools menu and then click anywhere in a data table or report window to see the Help
for that area.
•
Within a JMP window, click the Help button.
•
Search and view JMP Help on Windows using the Help > Help Contents, Search Help, and
Help Index options. On Mac, select Help > JMP Help.
•
Search the Help at http://jmp.com/support/help/ (English only).
Additional Resources for Learning JMP
In addition to JMP documentation and JMP Help, you can also learn about JMP using the
following resources:
•
Tutorials (see “Tutorials” on page 21)
•
Sample data (see “Sample Data Tables” on page 21)
•
Indexes (see “Learn about Statistical and JSL Terms” on page 21)
Chapter 1
Consumer Research
Learn about JMP
Additional Resources for Learning JMP
•
Tip of the Day (see “Learn JMP Tips and Tricks” on page 22)
•
Web resources (see “JMP User Community” on page 22)
•
JMPer Cable technical publication (see “JMPer Cable” on page 22)
•
Books about JMP (see “JMP Books by Users” on page 23)
•
JMP Starter (see “The JMP Starter Window” on page 23)
21
Tutorials
You can access JMP tutorials by selecting Help > Tutorials. The first item on the Tutorials menu
is Tutorials Directory. This opens a new window with all the tutorials grouped by category.
If you are not familiar with JMP, then start with the Beginners Tutorial. It steps you through the
JMP interface and explains the basics of using JMP.
The rest of the tutorials help you with specific aspects of JMP, such as creating a pie chart,
using Graph Builder, and so on.
Sample Data Tables
All of the examples in the JMP documentation suite use sample data. Select Help > Sample
Data Library to do the following actions to open the sample data directory.
To view an alphabetized list of sample data tables or view sample data within categories,
select Help > Sample Data.
Sample data tables are installed in the following directory:
On Windows: C:\Program Files\SAS\JMP\<version_number>\Samples\Data
On Macintosh: \Library\Application Support\JMP\<version_number>\Samples\Data
In JMP Pro, sample data is installed in the JMPPRO (rather than JMP) directory. In JMP
Shrinkwrap, sample data is installed in the JMPSW directory.
Learn about Statistical and JSL Terms
The Help menu contains the following indexes:
Statistics Index Provides definitions of statistical terms.
Lets you search for information about JSL functions, objects, and display
boxes. You can also edit and run sample scripts from the Scripting Index.
Scripting Index
22
Learn about JMP
Additional Resources for Learning JMP
Chapter 1
Consumer Research
Learn JMP Tips and Tricks
When you first start JMP, you see the Tip of the Day window. This window provides tips for
using JMP.
To turn off the Tip of the Day, clear the Show tips at startup check box. To view it again, select
Help > Tip of the Day. Or, you can turn it off using the Preferences window. See the Using JMP
book for details.
Tooltips
JMP provides descriptive tooltips when you place your cursor over items, such as the
following:
•
Menu or toolbar options
•
Labels in graphs
•
Text results in the report window (move your cursor in a circle to reveal)
•
Files or windows in the Home Window
•
Code in the Script Editor
Tip: You can hide tooltips in the JMP Preferences. Select File > Preferences > General (or JMP
> Preferences > General on Macintosh) and then deselect Show menu tips.
JMP User Community
The JMP User Community provides a range of options to help you learn more about JMP and
connect with other JMP users. The learning library of one-page guides, tutorials, and demos is
a good place to start. And you can continue your education by registering for a variety of JMP
training courses.
Other resources include a discussion forum, sample data and script file exchange, webcasts,
and social networking groups.
To access JMP resources on the website, select Help > JMP User Community.
JMPer Cable
The JMPer Cable is a yearly technical publication targeted to users of JMP. The JMPer Cable is
available on the JMP website:
http://www.jmp.com/about/newsletters/jmpercable/
Chapter 1
Consumer Research
Learn about JMP
Additional Resources for Learning JMP
23
JMP Books by Users
Additional books about using JMP that are written by JMP users are available on the JMP
website:
http://www.jmp.com/support/books.shtml
The JMP Starter Window
The JMP Starter window is a good place to begin if you are not familiar with JMP or data
analysis. Options are categorized and described, and you launch them by clicking a button.
The JMP Starter window covers many of the options found in the Analyze, Graph, Tables, and
File menus.
•
To open the JMP Starter window, select View (Window on the Macintosh) > JMP Starter.
•
To display the JMP Starter automatically when you open JMP on Windows, select File >
Preferences > General, and then select JMP Starter from the Initial JMP Window list. On
Macintosh, select JMP > Preferences > Initial JMP Starter Window.
24
Learn about JMP
Additional Resources for Learning JMP
Chapter 1
Consumer Research
Chapter 2
Introduction to Consumer Research
Overview of Customer and Behavioral Research Methods
You already collect information about how customers use a product or service or how
satisfied they are with your offerings. The resulting insight lets you create better products and
services, happier customers, and more revenue for your organization.
JMP now includes a full suite of tools for performing customer and consumer research. In the
past, you might have had to use one product for consumer research work and JMP for design
of experiments. Now you can do both types of analyses using a single product, for a more
efficient use of your most precious resource: your time. Tools for performing these statistical
analyses are now located in one convenient place: the Consumer Research menu. Use the
following platforms to analyze your data:
•
The Categorical platform supports survey analysis with questions in multiple formats,
allowing for both detailed and compact reporting. You can also analyze multiple response
questions, where your survey includes questions for which respondents can choose more
than one answer. You can output the results in crosstab report tables, use share and
frequency charts, view mean scores across responses, and perform tests and comparisons.
And when you are finished, you can easily output the completed analysis tables. For more
information, see Chapter 3, “Categorical Response Analysis”.
•
The Factor Analysis platform enables you to discover simple arrangements in the pattern
of relationships among variables. It seeks to discover if the observed variables can be
explained in terms of a much smaller number of variables or factors. By using factor
analysis, you can determine the number of factors that influence a set of measured,
observed variables, and the strength of the relationship between each factor and each
variable. For more information, see Chapter 4, “Factor Analysis”.
•
The Choice platform is designed for use in market research experiments, where the
ultimate goal is to discover the preference structure of consumers. Then, this information
is used to design products or services that have the attributes most desired by consumers.
For more information, see Chapter 5, “Choice Models”.
•
The Uplift platform enables you to maximize the impact of your marketing budget by
sending offers only to individuals who are likely to respond favorably, even when you
have large data sets and many possible behavioral or demographic predictors. You can use
uplift models to make such predictions. This method has been developed to help optimize
marketing decisions, define personalized medicine protocols, or, more generally, to
identify characteristics of individuals who are likely to respond to some action. For more
information, see Chapter 6, “Uplift Models”.
26
Introduction to Consumer Research
Chapter 2
Consumer Research
•
The Item Analysis platform provides a method of scoring tests. Based on Item Response
Theory (IRT), the platform helps analyze the design, analysis, and scoring of tests,
questionnaires, and other tools that measure abilities, attitudes, and other variables.
Although classical test theory methods have been widely used for a century, IRT provides
a better and more scientifically based scoring procedure. For more information, see
Chapter 7, “Item Analysis”.
•
The Multiple Correspondence Analysis (MCA) platform takes multiple categorical
variables and seeks to identify associations between levels of those variables. MCA is
frequently used in the social sciences particularly in France and Japan. It can be used in
survey analysis to identify question agreement. For more information, see Chapter 8,
“Multiple Correspondence Analysis”.
Chapter 3
Categorical Response Analysis
Analyzing Survey and Other Counting Data
The Categorical platform tabulates and summarizes categorical response data, including
multiple response data, and calculates test statistics. The strength of the Categorical platform
is that it can handle responses in a wide variety of formats without needing to reshape the
data. It is designed to handle survey and other categorical response data, such as defect
records, side effects, and so on.
Figure 3.1 Categorical Analysis Example
Contents
Categorical Platform Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Example of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Launch the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Response Roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Cast Selected Columns into Roles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Other Launch Window Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
The Categorical Report. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Share Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Frequency Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Categorical Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Statistical Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Free Text Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Structured Report Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Additional Examples of the Categorical Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Multiple Response. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Response Frequencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Indicator Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Multiple Delimited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Multiple Response by ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Mean Score Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Overview
29
Categorical Platform Overview
The Categorical platform can produce results from a rich variety of organizations of data, as
reflected in the tabbed panels that enable you to specify the analyses that you want. The
Categorical platform has capabilities similar to other platforms. The choice of platform
depends on your focus, the shape of your data, and the desired level of detail. The strength of
the Categorical platform is that it can handle responses in a wide variety of formats without
needing to reshape the data. Table 3.1 shows several of JMP’s analysis platforms and their
strengths.
Table 3.1 Comparing JMP’s Categorical Analyses
Platform
Specialty
Distribution
Separate, ungrouped categorical responses.
Fit Y By X:
Contingency
Two-way situations, including chi-square tests,
correspondence analysis, agreement.
Pareto Plot
Graphical analysis of multiple-response data, especially
multiple-response defect data, with more rate tests than Fit
Y By X.
Variability Chart:
Attribute
Attribute gauge studies, with more detail on rater
agreement.
Fit Model
Logistic categorical responses and generalized linear
models.
Partition, Neural Net
Specific categorical response models.
Example of the Categorical Platform
This example uses the Consumer Preferences.jmp sample data table, which contains survey
data on people’s attitudes and opinions, and some questions concerning oral hygiene (source:
Rob Reul, Isometric Solutions).
1. Select Help > Sample Data Library and open Consumer Preferences.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select I am working on my career and click Responses on the Simple tab.
4. Select Age Group and click X, Grouping Category.
5. Click OK.
6. Select Crosstab Transposed from the Categorical red triangle menu.
30
Categorical Response Analysis
Example of the Categorical Platform
Chapter 3
Consumer Research
7. Select Test Response Homogeneity from the Categorical red triangle menu.
Figure 3.2 details the responses indicating that a respondent is currently working on his or her
career and the age group. Of those responding positively, the highest majority working on
their career were in the age group 25-29 at 84.1%. The highest majority of those responding
oppositely were in the age group > 54 at 53.5%.
Figure 3.2 Survey Results by Age Group
Launch the Categorical Platform
Launch the Categorical Platform by selecting Analyze > Consumer Research > Categorical.
Chapter 3
Consumer Research
Categorical Response Analysis
Example of the Categorical Platform
31
Figure 3.3 Categorical Platform Launch Window
The launch window includes tabs for a variety of response roles (Simple, Related, and
Multiple) and a Structured tab where you can create your own structured responses. The
following sections describe the different response types and effects.
Response Roles
Use the response roles buttons within the tabs to choose selected columns as responses with
specified roles. You can also drag column names to the response list. The response roles are
summarized in Table 3.2.
Simple Tab
The default tab, Simple, contains a single button, Responses. This is appropriate for all basic
analyses that do not have a special structure. You can drag column names from the Select
Columns list to the Response list, or you can select columns and then click Responses. If a
column has a Multiple Response column property, JMP automatically changes the handling of
the column to recognize this property.
Related Tab
The Related tab contains a set of response columns that all have the same type of categories in
them:
32
Categorical Response Analysis
Example of the Categorical Platform
Chapter 3
Consumer Research
Aligned Responses Performs the analysis like the default analysis, but shows the results
more compactly by aligning the analyses side-by-side into one larger table.
Indicates that the columns reflect responses made by the same
individual at different times, and you are interested in the changes between the times.
Repeated Measures
Rater Agreement Is useful when each column is a rating for the same question, but by
different individuals (raters) and you want to study how much the raters agree on their
responses.
Multiple Tab
The Multiple tab is for multiple responses; when a question can involve checking off more
than one choice. There are a variety of ways of storing multiple response data, so there are
several buttons in the tab to accommodate the various means:
Multiple Response Means that you have several columns acting like fill-in-the-blank columns
to specify the multiple responses.
Multiple Response by ID Indicates that you have several rows in a table corresponding to the
multiple responses in one column, and the individuals are identified by an ID column.
Multiple Delimited Signifies that you have one column that has several responses in it
separated by a comma.
Indicator Group Denotes that there is a column for each possible response, and each column
is an indicator (for example, it has only two values, like 0 or 1).
Also has a column for each possible response, but has frequency
counts instead of an indicator.
Response Frequencies
Is used for comment fields where the analysis counts the frequency of each word
used. Free Text gives word counts in both word order and frequency order, and the rate of
non-empty text. For more information about Free Text, refer to “Free Text Report Options”
on page 56.
Free Text
Structured Tab
The Structured tab enables you to construct complex tables of descriptive statistics by
dragging column names into green icon drop zones to create side-by-side and nested results.
You can nest a variable within or beside another variable according to the structure that you
want for the top and side of the table. Continue to drag columns, either beside or nested
within another column, to specify the structure. For more information about structured
reports, refer to “Structured Report Options” on page 57.
Chapter 3
Consumer Research
Categorical Response Analysis
Example of the Categorical Platform
33
Figure 3.4 Structured Tab
To create a structured table:
1. Drag a column name to the green drop zone at the Top or Side of the table. The drop zone
is highlighted in pink.
2. Add more variables by dragging a column name the appropriate green drop zone. You can
also drag a column name between two columns.
To remove a selection, click the selection and then select Undo.
To add a selection back that you just removed, select Redo.
To clear all of the selections, select Clear.
3. When you are finished creating the table, click Add=> to add the variables to the response
list.
To make a revision once you have added your selection to the response list, click the
selection and then select <=Edit.
4. Complete the remainder of the launch window as necessary and click OK.
The Categorical report window appears.
5. Should you want to make a change to the table, select Relaunch Dialog from the
Categorical red triangle menu. The launch window reappears where you can make edits
to your selections.
A few guidelines with Structured effects:
•
Structured always assumes that the innermost terms on the Side are responses, and that all
other terms are sample level grouping factors.
•
You can analyze multiple response terms in the form of delimited multiple response
columns, but you must indicate that it is a multiple response by having a Multiple
Response column property. Use the Col Info window to add this property, as needed.
34
Categorical Response Analysis
Example of the Categorical Platform
Chapter 3
Consumer Research
Table 3.2 Response Roles
Response Role
Description
Simple Tab
Responses
Separate responses are in each
column, resulting in a separate
analysis for each column.
Related Tab
Aligned
Responses
Responses share common
categories across columns,
resulting in better-organized
reports.
Repeated
Measures
Aligned responses from an
individual across different times
or situations.
Rater Agreement
Aligned responses from different
raters evaluating the same unit,
to study agreement across raters.
Multiple Tab
Multiple
Response
Aligned responses, where
multiple responses are entered
across several columns, but
treated as one grouped response.
Multiple
Response by ID
Multiple responses across rows
that have the same ID values.
Multiple
Delimited
Several responses in a single cell,
separated by commas.
Indicator Group
Binary responses across columns,
like selected or deselected, yes or
no, but all in a related group.
Example Data
Chapter 3
Consumer Research
Categorical Response Analysis
Example of the Categorical Platform
35
Table 3.2 Response Roles (Continued)
Response Role
Description
Response
Frequencies
Columns containing frequency
counts for each response level, all
in a related group.
Free Text
Counts the frequency of each
word used in a comment field.
Example Data
Structured Tab
Drag variables to green drop
zones to create your own
structured table.
Cast Selected Columns into Roles
The lower right panel of the Launch window has the following options:
X, Grouping Category Defines sample level groups to break the counts into. By default, it
tabulates each combination of X values, but uses the Grouping Option below for other
combinations.
Sample Size Defines the number of individual units in the group for which that frequency is
applicable to, for multiple response roles with summarized data. For example, a Freq
column might indicate 50 defects, where the sample size variable would reflect the defects
for a batch of 100 units.
Specifies the column containing frequency counts for each row for presummarized
data.
Freq
ID Only required and used when Multiple Response by ID is selected.
By
Identifies a variable to produce a separate analysis for each value that appears in the
column.
Other Launch Window Options
Several launch options are presented in the lower left panel of the window that can be
specified before the analysis. The options can also be selected later from the Categorical red
triangle menu, and have the effect of rerunning the platform with the new setting. The default
settings for some of the launch options can be changed in the Categorical red triangle menu.
For more information, refer to “Set Preferences” on page 55.
36
Categorical Response Analysis
Example of the Categorical Platform
Chapter 3
Consumer Research
Grouping Option Specifies whether you want to use the X columns individually or in a
combination. Use this option only to specify more than one X (Grouping) column, and
denote whether you want to treat the Xs one at a time, or in a fully nested grouping, or
both. For example, if the X Columns are Region and Age Group, you can get separate
tables for Response by Region and Response by Age Group (each individually) or get a
nested table with each age group within each region (combinations), or both.
Combinations gives frequency results for combinations of the X variables.
Each Individually gives frequency results for each X variable individually.
Both gives frequency results for combinations of the X variables, and individually.
Allows duplicate response levels within a subject to be
counted only once. An ID variable must be specified.
Unique Occurrences within ID
The following options can be specified on the launch window as well as from the Categorical
report red triangle menu. They are also available as Preference settings. For more information,
refer to “Statistical Options” on page 41 and “Set Preferences” on page 55.
Changes the behavior to tabulate missing values as categories,
while still excluding them from statistical comparisons. When you have missing values,
this specifies whether you want to see them tabulated beside the nonmissing data, or just
excluded. Missing values can be either standard (numeric NAN or character empty) or a
code declared as missing with the column property Missing Value Codes. Note that if a
column contains only missing values, the missing values are counted regardless of the
state of this option.
Count Missing Responses
Changes the response order but keeps the X order from
low to high. The default ordering is low to high. You can control the ordering with a
column property (Value Ordering), but if you always want to see the high values first, then
select this option. Often, ordered categories are ratings, and you want to see the positive
ratings first. In the red triangle menu, this option is under Category Options.
Order Response Levels High to Low
Shorten Labels Shortens labels by removing common prefixes and suffixes. Sometimes
surveys code a lengthy label that contain a common prefix or suffix. For example,
“Occurred 5 to 10 times in the last year” might be a level, but the phrase “in the last year”
is repeated for each value label, and you do not need to see it repeated in the report. This
option trims the common prefixes and suffixes. It also changes multiple blanks into single
blanks. The option only applies to value labels, not column names.
Include Responses Not in Data Includes a count for values that were not in the data.
Sometimes when you conduct a survey and give choices, one of the choices is not selected.
If you still want to see the choices that are not in the data, use this option. It determines the
missing categories from the value labels in the column.
Chapter 3
Consumer Research
Categorical Response Analysis
The Categorical Report
37
Supercategories
When ratings are involved in a data set (for example, a five point scale), you might want to
know the percent of the responses in the top two or other subset of ratings. Such a group of
ratings can be defined in the data through the column property, Supercategories.
The term Supercategories refers to the extra slots in a table to aggregate over groups of
categories. The Supercategories property supports four keywords: Group, Mean, Std Dev, and
All. Mean and Std Dev calculate statistics for value scores, and All aggregates across all levels.
For example, a “Top Two” Group supercategory could aggregate the two top categories in a
response, as specified. Although we support Mean, Std Dev, and All, we do not recommend
using them because they are available as built-in statistics as well as supercategories.
Create Supercategories by selecting a column and then selecting Column Info > Column
Properties > Supercatagories. On the column properties window, select a column, enter a
Supercategory Name, and click Add.
The Supercategories Options red triangle menu item provides the following commands for a
selected category:
Hide Omits columns for each category from the report. Only a column for the supercategory
is included.
Net Prevents individuals from being counted twice when they appear in more than one
supercategory. Net is available only for a multiple response column.
The following red triangle options are also provided:
Add Mean
Includes mean statistics in the report.
Add Std Dev
Includes standard deviation statistics in the report.
Includes total responses in the report. By default, the Total Responses column is
always included.
Add All
Note: Supercategories are supported for all response effects except Repeated Measures and
Rater Agreement. Some response effects do not support Mean and Std Dev slots, because the
effects do not have a natural score.
The Categorical Report
The Categorical platform produces a report with several tables and bar charts depending on
your selections. You might or might not see all of the following options depending on which
response type and options you selected. Frequencies, Share of Responses, and Rate Per Case
appear in a single table by default. A Share Chart also appears by default (unless you used the
Structured tab in the launch window). You can choose to view a Frequency Chart or
Transposed Frequency Chart.
38
Categorical Response Analysis
The Categorical Report
Chapter 3
Consumer Research
You can view or hide each option (Frequencies, Share of Responses, Rate Per Case, Share
Chart, Frequency Chart, or Transposed Freq Chart) from the Categorical red triangle menu.
Data from Consumer Preferences.jmp is displayed in Figure 3.5.
Figure 3.5 The Categorical Report
The topmost item in the table is a Frequency count (Freq), showing the frequency counts for
each category with the total frequency (Total Responses) and total units (Total Cases) at the
bottom of the table.
Chapter 3
Consumer Research
Categorical Response Analysis
The Categorical Report
39
In this example, the number of responses and cases for each age group by the 7 segments are
displayed.
The Share of Responses (Share) is determined by dividing each count by the total number of
responses. The number represents the percent of the response among all the responses in the
sample (frequency divided by response total). This is either a column percentage or row
percentage depending on whether your table has the responses on top or down the side
(transposed).
For example, examine the second row of the table for Floss After Waking Up. The 37 responses
who floss when they wake up were 25.9% of all responses (37/143*100).
The Rate Per Case (Rate) divides each count in the frequency table by the total number of
cases. If you have multiple responses per case (subject), there are two types of percentages; the
rate per case is frequency as a percent of total cases, whereas the share of responses is the
frequency as a percent of the total responses. Rate is available only for multiple responses.
For example, in the third row of the table (Floss After Waking Up), the 37 respondents are from
113 cases, making the rate per respondent 32.7%.
Share Chart
The Share Chart presents a divided bar chart. The bar length is proportional to the percentage
of responses for each type. The column on the right shows the number of responses.
Figure 3.6 Share Chart
Frequency Chart
The Frequency Chart shows response frequencies. The bars reflect the frequency count on the
same scale and the number of responses are displayed to the right. To view the frequency
chart, select Frequency Chart from the Categorical red triangle menu.
40
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Figure 3.7 Frequency Chart
The Transposed Freq Chart option produces a transposed version of the Frequency Chart.
Marginal totals are given for each response, as opposed to each X variable.
Categorical Platform Options
The Categorical red triangle menu provides commands that customize the appearance of the
report and provide the means to test and compare your results. The following options appear
in the menu depending on response roles and options selected. You might or might not view
all of the options depending on your selections.
Report Options
The default report format is the Crosstab format, which gathers all three statistics for each
sample level and response together. The Crosstab format displays the responses on the top
and the sample levels down the side, with multiple table elements together in each cell of the
cross tabulation.
Figure 3.8 Crosstab Format
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
41
The Crosstab format has a transposed version, Crosstab Transposed, which is useful when
there are a lot of response categories but not a lot of sample levels. Crosstab Transposed
displays the responses down the side and the sample levels across the top, with multiple table
elements together in each cell.
The Structured analysis always uses the Crosstab Transposed form, but in a more complex
arrangement. The Free Text analysis has its own specialized reports. For more information
about Free Text, refer to “Free Text Report Options” on page 56.
Selecting the Legend red triangle menu shows or hides the legend for the response column on
the Share Chart.
Statistical Options
The main question of interest in any table is whether shares or rates vary from each group of
sample levels, and specifically which groups are significantly different.
There are two families of tests and comparisons, which correspond to single category
responses and multiple responses:
•
With single responses, a given response is just one response category, and the question is
whether the share of responses is different across sample levels.
Single responses are tested with a chi-square test of homogeneity. However, there are two
types of this test: the Likelihood Ratio Chi-square and the Pearson Chi-square. It is a
matter of personal preference and training which one you prefer. An option, Chi-square
Test Choices, on the Categorical red triangle menu, enables you to show one or the other,
or both. For more information, refer to “Test Options” on page 54.
•
With multiple responses, each individual can select several categories, and the question is
whether the rate is different across sample levels.
For multiple responses, each response is treated in a separate account, with a probability
of the count for each subject as a Poisson distribution (allowing for multiples of the same
category). Each response is tested to determine whether the parameters are the same
across sample levels.
The following options appear in the Categorical red triangle menu depending on context:
42
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Table 3.3 Categorical Platform Commands
Command
Supported Response
Contexts
Question
Details
Test Response
Homogeneity
•
Responses
•
Aligned
Responses
•
Repeated
Measures
Are the probabilities
across the response
categories the same
across sample
levels?
•
Response
Frequencies
with Sample
Size
•
Structured
Marginal
Homogeneity
(Independence)
Test, both Pearson
and Chi-square
likelihood ratio
chi-square. For more
information, refer to
“Test Response
Homogeneity” on
page 45.
Test Multiple
Response
•
Multiple
Response
(requires multiple
response data)
•
Multiple
Response by ID
(with Sample
Size)
For each response
category, are the
rates the same
across sample
levels?
•
Multiple
Delimited
Poisson regression
or binomial test on
sample for each
defect frequency.
For more
information, refer to
“Test Multiple
Response” on
page 46.
•
Response
Frequencies
with Sample
Size
•
Structured
How closely do
raters agree, and is
the lack of
agreement
symmetrical?
Kappa for
agreement, Bowker
and McNemar for
symmetry. For more
information, refer to
“Rater Agreement”
on page 50.
(requires multiple
response data)
Agreement Statistic
Rater Agreement
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
43
Table 3.3 Categorical Platform Commands (Continued)
Command
Supported Response
Contexts
Question
Details
Transition Report
Repeated Measures
How have the
categories changed
across time?
Transition counts
and rates matrices.
For more
information, refer to
“Repeated
Measures” on
page 51.
Cell Chisq
Responses
How do I further
analyze the results
to obtain more
information?
For more
information, refer to
“Cell Chisq” on
page 48.
Compare Each
Sample
•
Responses
•
Aligned
Responses
Do levels of the
response category
differ significantly?
•
Repeated
Measures
For more
information, refer to
“Compare Each
Sample” on page 52.
•
Response
Frequencies (if
no Sample Size)
•
Structured
•
Single and
Multiple
Responses
•
Structured
Do pairs of levels
within the two
response categories
differ significantly?
For more
information, refer to
“Compare Each
Cell” on page 53.
•
ChiSquare Test
Choices
•
Show Warnings
•
Order by
Significance
How do I further
analyze the results
to obtain more
information?
For more
information, refer to
“Test Options” on
page 54.
•
Hide
Nonsignificant
Compare Each Cell
Test Options
There are a series of options that add more detail for each group of sample levels. The options
that appear depend on your selections and the details of your analysis:
44
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Total Responses Shows the sum of the frequency counts for each group of sample levels.
Total Cases For multiple response columns, shows the number of cases (subjects), which are
different from the number of responses.
For multiple response columns used in Structured tables, counts
each person who responded at least once. People who did not respond at all are not
included.
Total Cases Responding
Mean Score Calculates the response means, using the numeric categories, or value scores.
This is enabled for columns that use numeric codes, or for categories that have a Value
Scores property. To make the Mean Score interpretable, you can assign specific value
scores in the Column Info window with the Value Scores column property.For more
information and an example, refer to “Mean Score Example” on page 64.
Mean Score Comparison Compares the mean scores across groups of sample levels, showing
which groups are significantly different. A pairwise multiple comparisons Student’s t test
is used for the mean score comparison, based on the specified comparison groups. For
more information about the letter codes, refer to “Comparisons with Letters” on page 53.
For more information about specifying the comparison groups, refer to “Specify
Comparison Groups” on page 58.
Std Dev Score
Calculates the standard deviation of the value scores.
Orders the mean score calculations. The option only appears when
there are no X columns in the analysis.
Order by Mean Score
Save Tables Saves the report to a new data table. For more information, refer to “Save
Tables” on page 54.
Filter Filters data to specific groups or ranges. Opens the Local Data Filter panel allowing
you to identify varying subsets of data. The filtered rows do not appear in the reports.
Sample levels with 0 values are always hidden. To show the filtered rows in reports, select
Include Responses Not in Data in the launch window. You can also select the Set
Preferences red triangle menu, and then select Include Responses Not in Data. For more
information, refer to Using JMP.
Contents Summary Collects all of the tests and mean scores into a summary at the top of the
report with links to the associated item.
Enables you to specify formats for Frequencies, Shares and Rates, and how
zeros are displayed. By default, Frequencies are Fixed Dec with 7 Width and 0 Decimals
and Shares and Rates are Percent with 6 Width and 1 Decimal.
Format Elements
Arrange in Rows Arranges the reports across the page as opposed to down. Enter the number
of reports that you want to view across the window.
Set Preferences Enables you to set preferences for future launches and sessions. For more
information, refer to “Set Preferences” on page 55.
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
45
Category Options Contains options (Grouping Option, Count Missing Response, Order
Response Levels High to Low, Shorten Labels, and Include Responses Not in Data) that are
also presented on the launch window that could be specified before the analysis. The
options can also be selected here and have the effect of rerunning the platform with the
new option setting. For more information, refer to “Other Launch Window Options” on
page 35.
Force Crosstab Shading
Forces shading on crosstab reports even if the preference is set to no
shading.
Enables you to return to the launch window and edit the specifications for a
structured table. For more information, refer to “Structured Tab” on page 32.
Relaunch Dialog
Script
Contains options that are available to all platforms. See Using JMP.
Test Response Homogeneity
Test Response Homogeneity is the standard chi-square test (for single responses) across all
sample levels. There is typically one categorical response variable and one categorical sample
variable. Multiple sample variables are treated as a single variable.
The test is the chi-square test for marginal homogeneity of response patterns, testing that the
response probabilities are the same across samples. This is equivalent to a test for
independence when the sample level is like a response. There are two versions of this test, the
Pearson form and the Likelihood Ratio form, both with chi-square statistics. The Test Options
menu (ChiSquare Test Choices) is used to show or hide the Likelihood Ratio or Pearson tests.
If Show Warnings is turned on, the report displays if the frequencies are too low to make good
tests.
As an example:
1. Select Help > Sample Data Library and open Car Poll.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select country and click Responses on the Simple tab.
4. Select marital status and click X, Grouping Category.
5. Click OK.
6. Select Test Response Homogeneity from the Categorical red triangle menu.
46
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Figure 3.9 Test Response Homogeneity
The Share Chart indicates that the married group is more likely to buy American cars, and the
single group is more likely to buy Japanese cars, but the statistical test only shows a
significance of 0.08. Therefore, the difference in response probabilities across marital status is
not statistically significant at an alpha level of 0.05.
Test Multiple Response
Test Multiple Response is the standard chi-square test for multiple responses, with one test
statistic for each response category. When there are multiple responses, each response
category can be modeled separately. The question is whether the response rates are the same
across samples.
The Test Multiple Response red triangle menu provides the following options:
For each response category, the frequency count has a random Poisson
distribution. The rate test is obtained using a Poisson regression (through generalized
linear models) of the frequency per unit modeled by the sample categorical variable. The
result is a likelihood ratio chi-square test of whether the rates are different across sample
levels. This test can also be done by the Pareto platform, as well as in the Generalized
Linear Model personality of the Fit Model platform.
Count Test, Poisson
Homogeneity Test, Binomial For each response category, the frequency count has a random
binomial distribution. Select this test when the response can be true only once for each
respondent (for example, in a “check all that apply” questionnaire).
To test multiple responses, follow these steps:
1. Select Help > Sample Data Library and open Consumer Preferences.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select Brush Delimited and click Multiple Delimited on the Multiple tab.
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
47
4. Select brush and click X, Grouping Category to compare the sample levels across the brush
treatment variable.
5. Click OK.
6. Select Test Multiple Response and then Count Test, Poisson from the Categorical red
triangle menu.
Figure 3.10 Test Multiple Response, Poisson
The p-values show that After Meal and Before Sleep are the most significantly different.
Wake is not significantly different with this amount of data.
7. Select Test Multiple Response and then Homogeneity Test, Binomial from the Categorical
red triangle menu.
48
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Figure 3.11 Test Multiple Response, Binomial
The Homogeneity Test, Binomial option always produces a larger test statistic (and
therefore a smaller p-value) than the Count Test, Poisson option. The binomial distribution
compares not only the rate at which the response occurred (the number of people who
reported that they brush upon waking) but also the rate at which the response did not
occur (the number of people who did not report that they brush upon waking).
Cell Chisq
For single responses, Cell Chisq displays the cell-by-cell composition of the Pearson
chi-square overall, and also shows which cells have relatively more (red) or less (blue) than
expected if they were the same across sample levels. The value shown is the p-value for the
chi-square. The color is bright when they are significant, and grayer when less significant,
denoting visually where the significant differences are.
As an example:
1. Select Help > Sample Data Library and open Consumer Preferences.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select I am working on my career and click Responses on the Simple tab.
4. Select Age Group and click X, Grouping Category.
5. Click OK.
6. Select Crosstab Transposed from the red triangle menu.
7. Select Cell Chisq from the red triangle menu.
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
49
Figure 3.12 Cell Chisq
Relative Risk
The Relative Risk option is used to compute relative risks for different responses. The risk of
responses is computed for each level of the X, Grouping variable. The risks are compared to
get a relative risk. This option is available when the X, Grouping variable has two levels, and
one of the following is true:
•
The response variable has two levels.
•
The response variable is a Multiple Response and the Unique occurrences within ID box is
checked on the Categorical launch window.
A common application of this analysis is when the responses represent adverse events (side
effects), and the X variable represents a treatment (drug versus placebo). The risk for getting
each side effect is computed for both the drug and placebo. The relative risk is the ratio of the
two risks.
Conditional Association
The Conditional Association option is used to compute the conditional probability of one
response given a different response. A table and color map of the conditional probabilities are
given. This option is available only when the Unique occurrences within ID box is checked on
the Categorical launch window. A common application of this analysis is when the responses
represent adverse events (side effects) from a drug. The computations represent the
conditional probability of one side effect given the presence of another side effect. For
AdverseR.jmp, given the response in each row, Figure 3.13 shows the rate of also having the
response in a column. Figure 3.13 only displays a few variables in the table due to size
constraints.
50
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Figure 3.13 Conditional Association
Rater Agreement
The Rater Agreement analysis answers the questions of how closely raters agree with one
another and if the lack of agreement is symmetrical. For example, open Attribute Gauge.jmp.
The Attribute Chart script runs the Variability Chart platform, which has a test for agreement
among raters.
Figure 3.14 Agreement Comparisons
Launch the Categorical platform and designate the three raters (A, B, and C) as Rater
Agreement responses on the Related tab on the launch window. In the resulting report, you
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
51
have a similar test for agreement that is augmented by a symmetry test that the lack of
agreement is symmetric.
Figure 3.15 Agreement Statistics
Repeated Measures
Repeated Measures declares that the columns reflect responses made by the same individual
at different times, and you are interested in the changes between the times. Individual reports
are displayed for each item, with a transition report at the end demonstrating the transition
counts and rate matrices.
As an example:
1. Select Help > Sample Data Library and open Presidential Elections.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select 1980 Winner through 2012 Winner and click Repeated Measures on the Related tab.
4. Select State and click X, Grouping Category.
5. Click OK.
6. Scroll to the bottom of the report window and open the Transition Report outline.
Scroll through the responses to see how each State has voted over the years. Note that New
Mexico has varied between Democratic and Republic over the years.
52
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Figure 3.16 Repeated Measures
Compare Each Sample
For a given response, Compare Each Sample tests whether the response probability for each of
its levels differs from the response probabilities for its other levels. In simple situations, the
Compare Each Sample report consists of symmetric matrices of p-values, as shown in
Figure 3.17.
In addition, a new row or column, entitled Compare, appears in the Crosstabs table. The
Compare row is placed at the bottom of the table, or the Compare column is placed at the far
right. (Whether a row or column is appended depends on whether Crosstab or Crosstab
Transposed is specified.) The Compare row or column contains letter codes showing which
sample levels differ significantly. For more information about the letter codes, refer to
“Comparisons with Letters” on page 53.
Figure 3.17 Compare Each Sample
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
53
Compare Each Cell
For a given response and a given X variable, Compare Each Cell tests, for each level of the X
variable, whether the response probabilities differ across the levels of the response. In other
words, Compare Each Cell tests response probabilities across the cells in a given row of the
Crosstabs table. The Compare Each Cell report gives p-values in a tabular format. The letters
across the top indicate the response levels tested for the given level of the X variable. An
example is shown in Figure 3.18.
In addition, when a cell differs significantly from other cells, a letter code is inserted into the
appropriate cell in the Crosstabs table. For details on the letter codes and on their placement in
cells, refer to “Comparisons with Letters” on page 53.
Figure 3.18 Compare Each Cell (cut off after column AE)
Comparisons with Letters
The Compare Each Cell, Compare Each Sample, and Mean Score Comparisons commands use
a system of letters to identify sample levels. The first sample level is “A”, the second “B”. For
more than 26 sample levels, numbers are appended after the letters. The letters are shown in
the sample level headings when a comparison command is turned on.
If two sample levels are significantly different, the letter of the sample level with lesser share is
placed into the comparison cell of the other category that is significantly different. To find out
if a given group of sample levels, for example “B”, is significantly different, then you have to
look both in the comparison cell for column “B” for other letters, and also in all the other cells
across the groups of sample levels for a “B”.
Lowercase letters are also used for comparisons that are slightly less significant, according to
Table 3.4. These comparisons suffer when the count for that sample level group (the Base
Count) is small, and asterisks start to appear in the comparison cells to warn you.
The comparison features are controlled by four options set in Preferences or through a script.
For more information, refer to “Set Preferences” on page 55.
54
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Table 3.4 Letter Comparisons
Uppercase alpha level
0.05
The significance level for which uppercase letters
show differences.
Lowercase alpha level
0.10
The significance level for which lowercase letters
show differences.
Base Count minimum
≤ 29
The count for a sample level that leads to a **
warning.
Base Count warning
30 to 99
The count for a sample level that leads to a *
warning.
Test Options
The Test Options menu on the Categorical red triangle menu has the following options
depending on your selections:
ChiSquare Test Choices Single responses are tested with a chi-square test of homogeneity;
either the Likelihood Ratio Chi-square or the Pearson Chi-square, or both. Options are:
Both LR and Pearson, LR Only, or Pearson Only. You can set an option in Preferences.
Show Warnings Shows warnings for chi-square tests related to small sample sizes.
Reorders the reports so that the most significant reports are at the top.
This option only applies to reports with one homogeneity test.
Order by Significance
Suppresses reports that are deemed non-significant. This option only
applies to reports with one homogeneity test.
Hide Nonsignificant
Save Tables
The Save Tables menu on the Categorical red triangle menu has the following options
depending on your selections:
Save Frequencies Saves the Frequency report to a new data table, without the marginal totals
or supercategories.
Save Share of Responses Saves the Share of Responses report to a new data table, without
the marginal totals.
Save Rate Per Case Saves the Rate Per Case report to a new data table, without the marginal
totals.
Save Transposed Frequencies Saves the Transposed Freq Chart report to a new data table,
without the marginal totals.
Save Transposed Share of Responses Saves a transposed version of the Share of Responses
report to a new data table
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
Save Transposed Rate Per Case
55
Saves a transposed version of the Rate Per Case report to a
new data table.
Save Test Rates Saves the results of the Test Multiple Response option to a new data table.
Save Test Homogeneity Saves the results of the Test Response Homogeneity option to a new
data table.
Save Mean Scores Saves the mean scores for each sample group in a new data table.
Save tTests and pValues
Save t-tests and p-values from the Mean Score Comparisons report
in a new data table.
Creates a Microsoft Excel spreadsheet with the structure of the
crosstab-format report. The option maps all of the tables to one sheet, with the response
categories as rows, the sample levels as columns, sharing the headings for sample levels
across multiple tables. When there are multiple elements in each table cell, you have the
option to make them multiple or single cells in Microsoft Excel.
Save Excel File
Set Preferences
You can specify settings and set preferences within the Categorical platform. Several options
are available on the launch window and can be specified before the analysis. Some of the
options can also be selected from the Categorical red triangle menu, and have the effect of
rerunning the analysis with the new setting.
The options are initialized to the current state. Select the appropriate options and select either
Submit Platform Preferences or Create Platform Preference Script to submit the options to
your preferences as the new default. When the Categorical platform is launched, the
preferences associated with the current preference set are enacted.
Preferences can be administered and shared through a script. The best way to share a
preference set widely is to create an add-in, so that if the preference settings are reset to the
initial state, the add-in could restore the preferred set.
56
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
Figure 3.19 Set Preferences Window
Free Text Report Options
Free Text is used for comment fields where the analysis counts the frequency of each word
used. Free Text gives word counts in both word order and frequency order, and the rate of
non-empty text. The following example uses the Consumer Preferences.jmp sample data table,
which contains survey data relating to oral hygiene preferences. A comment field was
included in the survey asking for reasons why the participant did not floss.
1. Select Help > Sample Data Library and open Consumer Preferences.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select Reasons Not to Floss and click Free Text on the Multiple tab.
4. Click OK.
5. Select Score Words by Column from the first Free Text Word Counts for Reasons Not to
Floss red triangle menu and then select Floss. Click OK.
Chapter 3
Consumer Research
Categorical Response Analysis
Categorical Platform Options
57
Figure 3.20 Free Text Report Example
Figure 3.20 details the free text word counts the respondents included as reasons why they do
not floss. From the analysis, you can determine the number of words, cases, non-empty cases,
and portions of non-empty cases. You can also view the word counts alphabetically, in terms
of frequency, or by the mean scores. There are more commands to further customize the
analysis on the Free Text red triangle menu on the report:
Calculates for each word the average score for another column for
the rows that the word appears. If you save a word table later, it will have these scores in
the saved data table, plus a Treemap script to colorize by these scores.
Score Words by Column
Prompts you for the number of words to make
indicators for, and then creates the indicator columns for the most frequent words in the
data table indicating if that word appeared.
Save Indicators for Most Frequent Words
Creates a new table of all the words, their frequency, and the scores with
respect to any of the columns scored.
Save Word Table
Remove
Removes the table from the report.
Structured Report Options
The Structured tab enables you to construct complex tables of descriptive statistics by
dragging column names into green icon drop zones to create side-by-side and nested results.
The following example uses the Consumer Preferences.jmp sample data table. From this data,
suppose that you wanted to compare job satisfaction and salary against gender by age group
and position tenure.
1. Select Help > Sample Data Library and open Consumer Preferences.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select the Structured tab.
4. Drag Gender to the green drop zone at the Top of the table on the Structured tab.
5. Drag Age Group to the green drop zone just below Gender.
6. Drag Position Tenure to the green drop zone at the Top of the table next to Gender.
7. Drag Job Satisfaction to the green drop zone at the Side of the table.
58
Categorical Response Analysis
Categorical Platform Options
Chapter 3
Consumer Research
8. Drag Salary Group to the green drop zone at the Side of the table under Job Satisfaction.
Figure 3.21 Structured Tab Report Setup
9. Click Add=>.
10. Click OK.
Figure 3.22 Structured Tab Report Example
Figure 3.22 shows that the majority of both the male and female respondents were somewhat
satisfied with their jobs, with the highest percentage of males being in the 25-29 age group,
while the females were in the 30-34 age group. Most of those who were somewhat satisfied
had been in their current position for less than 5 years.
The following options are available from the structured report’s red triangle menu:
Forces the table to display the column letter IDs, which usually come out
automatically when you do a compare command.
Show Letters
Specify Comparison Groups Enables you to specify groups when the group of sample levels
that you want to test and compare are not the same as the innermost term’s structure. To
use this option, you must look at the letter IDs, and then enter sets of letter IDs, separated
Chapter 3
Consumer Research
Categorical Response Analysis
Additional Examples of the Categorical Platform
59
by a slash, representing each group, separating multiple groups from each other by
commas. For example, the default grouping might be “A/B/C, E/D/F”, but you want to test
A with E, B with D and C with F, so you specify the groups as “A/E, B/D, C/F”. This
determines which letters appear in the comparison fields. In addition, a summary report
shows the overall tests for each column group.
Remove
Removes the table from the report.
Additional Examples of the Categorical Platform
The following examples come from testing a fabrication line on three different occasions
under two different conditions. Each set of operating conditions yielded 50 data points.
Inspectors recorded the following types of defects:
•
contamination
•
corrosion
•
doping
•
metallization
•
miscellaneous
•
oxide defect
•
silicon defect
Each unit could have several defects or even several defects of the same kind. We illustrate the
data in a variety of different examples all within the Categorical platform.
Multiple Response
Suppose that the defects for each unit are entered via a web page, but because each unit rarely
has more than three defect types, the form has three fields to enter any of the defect types for a
unit, as in Failure3MultipleField.jmp.
1. Select Help > Sample Data Library and open Quality Control/Failure3MultipleField.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select Failure1, Failure2, and Failure3 and click Multiple Response on the Multiple tab.
These columns contain defect types and are the variables that you want to inspect.
4. Select clean and date and click X, Grouping Category.
5. Click OK.
Figure 3.23 lists failure types and counts for each failure type from the Multiple Response
analysis.
60
Categorical Response Analysis
Additional Examples of the Categorical Platform
Chapter 3
Consumer Research
Figure 3.23 Multiple Response Failures from Failure3MultipleField.jmp
Response Frequencies
Suppose the data have columns containing frequency counts for each batch and a column
showing the total number of units of the batch, as in Failure3Freq.jmp.
Figure 3.24 Failure3Freq.jmp Data Table
1. Select Help > Sample Data Library and open Quality Control/Failure3Freq.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select the frequency variables (contamination, corrosion, doping, metallization,
miscellaneous, oxide defect, silicon defect) and click Response Frequencies on the Multiple
tab.
4. Select clean and date and click X, Grouping Category.
5. Select Sample Size and click Sample Size.
6. Click OK.
The resulting output in Figure 3.25 shows a frequency count table, with a separate column for
each of the seven batches. The last two columns show the total number of defects (Total
Responses) and cases (Total Cases).
Chapter 3
Consumer Research
Categorical Response Analysis
Additional Examples of the Categorical Platform
61
Figure 3.25 Defect Rate Output
Each Frequency Group contains the following information:
•
The total number of defects for each defect type. For example, after cleaning on Oct 1st,
there were 12 contamination defects.
•
The share of responses. For example, after cleaning on Oct 1st, the 12 contamination
defects were (12/23) accounting for 52.2% of all defects.
•
The rate per case. For example, after cleaning on Oct 1st, the 12 contamination defects are
from 50 units (12/50) making the rate per unit 24%.
Indicator Group
In some cases, the data is not yet summarized, so there are individual records for each unit.
We illustrate this situation with the data table, Failures3Indicators.jmp.
62
Categorical Response Analysis
Additional Examples of the Categorical Platform
Chapter 3
Consumer Research
Figure 3.26 Failure3Indicators.jmp Data Table
1. Select Help > Sample Data Library and open Quality Control/Failures3Indicators.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select the defect columns (contamination, corrosion, doping, metallization, miscellaneous,
oxide defect, silicon defect) and click Indicator Group on the Multiple tab.
4. Select clean and date and click X, Grouping Category.
5. Click OK.
When you click OK, you get the same output as in the Response Group example (Figure 3.25).
Multiple Delimited
Suppose that an inspector entered the observed defects for each unit. The defects are listed in
a single column, delimited by a comma, as in Failures3Delimited.jmp. Note in the partial data
table, shown below, that some units did not have any observed defects, so the failureS column
is empty.
Chapter 3
Consumer Research
Categorical Response Analysis
Additional Examples of the Categorical Platform
63
Figure 3.27 Failure3Delimited.jmp Data Table
1. Select Help > Sample Data Library and open Quality Control/ Failures3Delimited.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select failureS and click Multiple Delimited on the Multiple tab.
4. Select clean and date and click X, Grouping Category.
5. Select ID and click ID.
6. Click OK.
When you click OK, you get the same output as in Figure 3.25.
Note: If more than one delimited column is specified, separate analyses are produced for each
column.
Multiple Response by ID
Suppose each failure type is a separate record, with an ID column that can be used to link
together different defect types for each unit, as in Failure3ID.jmp.
Figure 3.28 Failure3ID.jmp Data Table
64
Categorical Response Analysis
Additional Examples of the Categorical Platform
Chapter 3
Consumer Research
1. Select Help > Sample Data Library and open Quality Control/ Failure3ID.jmp.
2. Select Analyze > Consumer Research > Categorical.
3. Select failure and click Multiple Response by ID on the Multiple tab.
4. Select clean and date and click X, Grouping Category.
5. Select SampleSize and click Sample Size.
6. Select N and click Freq.
7. Select ID and click ID.
8. Click OK.
When you click OK, you get the same output as in Figure 3.25.
Mean Score Example
You can calculate response means in your data using Value Scores. To make the Mean Score
interpretable, you can assign specific value scores in the Column Info window with the Value
Scores column property. For more information about column properties, refer to Using JMP.
In this example, you can assign Value Scores to calculate the Net Promoter Score (Reichheld,
HBR 2003), which summarizes an 11-level rating with a favorability score between -100 and
100. Anything with a value of 6 or below is regarded as a detractor.
1. Run the following script:
New Table("Rating Example",
Add Rows(300),
New Script("Categorical",Categorical(Responses(:Rating),Mean Score(1))),
New Column("Rating",Numeric,Ordinal,
Set Property(
"Value Scores",
{0=-100,1=-100,2=-100,3=-100,4=-100,5=-100,6=-100,7=0,8=0,9=100,10=100}),
Formula(Random Category(
0.05,0,0.05,1,0.05,2,0.05,3,0.05,4,
0.05,5,0.05,6,0.05,7,0.05,8,0.3,9,0.25,10)),
Set Selected
)
);
2. A data table with 300 rows of random rating data is created. Value scores were also
defined for the Rating column. To view the scores, right-click the Rating column and select
Column Properties > Value Scores.
Chapter 3
Consumer Research
Categorical Response Analysis
Additional Examples of the Categorical Platform
65
Figure 3.29 Column Properties - Value Scores
3. Select Analyze > Consumer Research > Categorical.
4. Select Rating and click Responses on the Simple tab.
5. Click OK.
6. Select Mean Score from the Categorical red triangle menu.
Figure 3.30 Rating Example Report
Based on the defined value scores, a mean score of 18 was determined. Your results might be
different as the Rating column values are random.
66
Categorical Response Analysis
Additional Examples of the Categorical Platform
Chapter 3
Consumer Research
Chapter 4
Factor Analysis
Identify Factors within Variables
Factor analysis (also known as common factor analysis and exploratory factor analysis) seeks
to describe a collection of observed variables in terms of a smaller collection of (unobservable)
latent variables, or factors. These factors, which are defined as linear combinations of the
observed variables, are constructed to explain variation that is common to the observed
variables. A primary goal of factor analysis is to achieve a meaningful interpretation of the
observed variables through the factors. Another goal is to reduce the number of variables.
Factor analysis is used in many areas, and is of particular value in psychology, sociology, and
education. In these areas, factor analysis is used to understand how manifest behavior can be
interpreted in terms of underlying patterns and structures. For example, measures of
participation in outdoor activities, hobbies, exercise, and travel, may all relate to a factor that
can be described as “active versus inactive personality type”. Factor analysis attempts to
explain correlations among the observed variables in terms of the factor. In particular, it
allows you to determine how much of the variance in each observable variable is accounted
for by the factors you have identified. It also tells you how much of the variance in all the
variables is accounted for by each factor.
Use factor analysis when you need to explore or interpret underlying patterns and structure in
your data. Also consider using it to summarize the information in your variables using a
smaller number of latent variables.
Figure 4.1 Rotated Factor Loading
Contents
Factor Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Example of the Factor Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Launch the Factor Analysis Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
The Factor Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Model Launch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Rotation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Factor Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Factor Analysis Model Fit Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Chapter 4
Consumer Research
Factor Analysis
Factor Analysis Platform Overview
69
Factor Analysis Platform Overview
Factor analysis models a set of observable variables in terms of a smaller number of
unobservable factors. These factors account for the correlation or covariance between the
observed variables. Once the factors are extracted, you perform factor rotation in order to
obtain a meaningful interpretation of the factors.
Consider a situation where you have ten observed variables, X1, X2, …, X10. Suppose that you
want to model these ten variables in terms of two latent factors, F1 and F2. For convenience, it
is assumed that the factors are uncorrelated and that each has mean zero and variance one.
The model that you want to derive is of the form:
X i = β i0 + β i1 F 1 + β i2 F 2 + ε i
2
2
It follows that Var ( Xi ) = β i1 + β i2 + Var ( ε i ) . The portion of the variance of Xi that is
2
2
attributable to the factors, the common variance or communality, is β i1 + β i2 . The remaining
variance, Var ( ε i ) , is the specific variance, and is considered to be unique to Xi.
The Factor Analysis platform provides a Scree Plot for the eigenvalues of the correlation or
covariance matrix. You can use this as a guide in determining the number of factors to extract.
Alternatively, you can accept the platform's suggestion of setting the number of factors equal
to the number of eigenvalues that exceed one.
The platform provides two factoring methods for estimating the parameters of this model:
Principal Components and Maximum Likelihood.
JMP provides two options for estimating the proportion of variance contributed by common
factors for each variable. These Prior Communality options impose assumptions on the
diagonal of the correlation (or covariance) matrix. The Principal Components option treats the
correlation matrix, which has ones on its diagonal (or the covariance matrix with variances on
its diagonal), as the structure to be analyzed. The Common Factor Analysis option sets the
diagonal entries to values that reflect the proportion of the variation that is shared with other
variables.
To support interpretability of the extracted factors, you rotate the factor structure. The Factor
Analysis platform provides a variety of rotation methods that encompass both orthogonal and
oblique rotations.
In contrast with factor analysis which looks at common variance, principal component
analysis accounts for the total variance of the observed variables. See the Principal
Components chapter in the Multivariate Methods book.
70
Factor Analysis
Example of the Factor Analysis Platform
Example of the Factor Analysis Platform
To view an example Factor Analysis report for a data table for two factors:
1. Select Help > Sample Data Library and open Solubility.jmp.
2. Select Analyze > Consumer Research > Factor Analysis.
The Factor Analysis launch window appears.
3. Select all of the continuous columns and click Y, Columns.
4. Keep the default Estimation Method and Variance Scaling.
5. Click OK.
The initial Factor Analysis report appears.
Figure 4.2 Initial Factor Analysis Report
6. For the Model Launch, select the following options:
‒ Factoring Method as Maximum Likelihood
‒ Prior Communality as Common Factor Analysis
‒ Number of factors = 2
‒ Rotation Method as Varimax
7. After all selections are made, click Go.
The Factor Analysis report appears.
Chapter 4
Consumer Research
Chapter 4
Consumer Research
Factor Analysis
Example of the Factor Analysis Platform
71
Figure 4.3 Example Factor Analysis Report
The report lists the communality estimates, variance, significance tests, rotated factor
loadings, and a factor loading plot. Note that in the Factor Loading Plot, Factor 1 relates to the
Carbon Tetrachloride-Chloroform-Benzene-Hexane cluster of variables, and Factor 2 relates to
the Ether–1-Octanol cluster of variables. See “Factor Analysis Model Fit Options” on page 78
for details of the information shown in the report.
72
Factor Analysis
The Factor Analysis Report
Chapter 4
Consumer Research
Launch the Factor Analysis Platform
Launch the Factor Analysis platform by selecting Analyze > Consumer Research > Factor
Analysis. This example uses the Solubility.jmp sample data table.
Figure 4.4 Factor Analysis Launch Window
Y, Columns Lists the continuous columns to be analyzed.
Weight Enables you to weight the analysis to account for pre-summarized data.
Identifies a column whose numeric values assign a frequency to each row in the
analysis.
Freq
By
Creates a Factor Analysis report for each value specified by the By column so that you can
perform separate analyses for each group.
Estimation Method Lists different methods for fitting the model. For details about the
methods, see the Multivariate chapter in the Multivariate Methods book.
Variance Scaling Lists the scaling methods for performing the factor analysis based on
Correlations (the same as Principal Components), Covariances, or Unscaled.
The Factor Analysis Report
The initial Factor Analysis report shows Eigenvalues and the Scree Plot. The Eigenvalues are
obtained from a principal components analysis. The Scree Plot graphs these eigenvalues. The
number of factors that JMP suggests in the Model Launch equals the number of eigenvalues
that exceed 1.0.
Alternatively, you can use the scree plot to guide your initial choice for number of factors. The
number of eigenvalues that appear before the scree plot levels out can provide an upper
bound on the number of factors.
Chapter 4
Consumer Research
Factor Analysis
The Factor Analysis Report
73
Figure 4.5 Factor Analysis Report
In the example shown in Figure 4.5, the Scree Plot begins to level out after the second
eigenvalue. The Eigenvalues table indicates that the first eigenvalue accounts for 79.75% of the
variation and the second eigenvalue accounts for 15.75%, for a total of 95.50% of the total
variation. The third eigenvalue only explains 2.33% of the variation, and the contributions
from the remaining eigenvalues are negligible. Although the Number of factors box is initially
set to 1, this analysis suggests that extracting 2 factors is appropriate.
Model Launch
To configure the Factor Analysis model, use the Model Launch section at the bottom of the
Factor Analysis Report (Figure 4.6).
74
Factor Analysis
The Factor Analysis Report
Chapter 4
Consumer Research
Figure 4.6 Model Launch
The Model Launch section enables you to configure the following options:
1. Factoring method - the method for extracting factors.
‒ The Principal Components method is a computationally efficient method, but it does
not allow for hypothesis testing.
‒ The Maximum Likelihood method has desirable properties and allows you to test
hypotheses about the number of common factors.
Note: The Maximum Likelihood method requires a positive definite correlation matrix. If your
correlation matrix is not positive definite, select the Principal Components method.
2. Prior Communality - the method for estimating the proportion of variance contributed by
common factors for each variable.
‒ Principal Components (diagonals = 1) sets all communalities equal to 1, indicating that
100% of each variable’s variance is shared with the other variables. Using this option
with Factoring Method set to Principal Components results in principal component
analysis.
‒ Common Factor Analysis (diagonals = SMC) sets the communalities equal to squared
multiple correlation (SMC) coefficients. For a given variable, the SMC is the RSquare
for a regression of that variable on all other variables.
3. The Number of factors (or principal components) determined by eigenvalues greater than
or equal to 1.0 or from the scree plot where the graph begins to level out.
Note: Alternatively, the Kaiser criterion retains those factors with eigenvalues greater than 1.0.
In our example, only factor 1 would be retained for analysis.
Chapter 4
Consumer Research
Factor Analysis
The Factor Analysis Report
75
4. The Rotation method to align the factor directions with the original variables for ease of
interpretation. The default value is Varimax. See “Rotation Methods” on page 75 for a
description of the available selections.
5. Click Go to generate the Factor Analysis report.
Depending on the selected Variance Scaling, the appropriate factor analysis results appear.
See “Factor Analysis Model Fit Options” on page 78 for details about the contents of the
report. The Factor Analysis on Correlations and Factor Analysis on Unscales reports show
the same information.
Rotation Methods
Rotations align the directions of the factors with the original variables so that the factors are
more interpretable. You hope for clusters of variables that are highly correlated to define the
rotated factors.
After the initial extraction, the factors are uncorrelated with each other. If the factors are
rotated by an orthogonal transformation, the rotated factors are also uncorrelated. If the
factors are rotated by an oblique transformation, the rotated factors become correlated.
Oblique rotations often produce more useful patterns than do orthogonal rotations. However,
a consequence of correlated factors is that there is no single unambiguous measure of the
importance of a factor in explaining a variable.
For each rotation method, we used the example described in “Example of the Factor Analysis
Platform” on page 70 to view the Rotated Factor Loading and Factor Loading Plot.
Orthogonal Rotation Methods
Table 4.1 lists the available orthogonal (that is, uncorrelated) rotation methods.
Table 4.1 Orthogonal Rotation Methods
Method
SAS PROC FACTOR Equivalent
Varimax
ROTATE=ORTHOMAX with GAMMA = 1
Note: This is the default selection.
Biquartimax
ROTATE=ORTHOMAX with GAMMA = 0.5
Equamax
ROTATE=ORTHOMAX with GAMMA = number of factors/2
Factorparsimax
ROTATE=ORTHOMAX with GAMMA = number of variables
76
Factor Analysis
The Factor Analysis Report
Chapter 4
Consumer Research
Table 4.1 Orthogonal Rotation Methods (Continued)
Method
SAS PROC FACTOR Equivalent
Orthomax
ROTATE=ORTHOMAX
Or
ROTATE=ORTHOMAX(p), where p is the orthomax weight or
the GAMMA = value.
Parsimax
Quartimax
Note: The default p value is 1 unless specified otherwise in the
GAMMA = option. For additional information about orthomax
weight, see the SAS documentation, “Simplicity Functions for
Rotations.”
( nvar ( nfact – 1 ) )
ROTATE=ORTHOMAX with GAMMA = ------------------------------------------( nvar + nfact – 2 )
where nvar is the number of variables, and nfact is the number of
factors.
ROTATE=ORTHOMAX with GAMMA=0
Oblique Rotation Methods
Table 4.2 lists the available oblique (that is, correlated) rotation methods.
Table 4.2 Oblique Rotation Methods
Method
SAS PROC FACTOR Equivalent
Biquartimin
ROTATE=OBLIMIN(.5)
Or
ROTATE=OBLIMIN with TAU=.5
Covarimin
ROTATE=OBLIMIN(1)
Or
ROTATE=OBLIMIN with TAU=1
Obbiquartimax
ROTATE=OBBIQUARTIMAX
Obequamax
ROTATE=OBEQUAMAX
Obfactorparsimax
ROTATE=OBFACTORPARSIMAX
Chapter 4
Consumer Research
Factor Analysis
Factor Analysis Platform Options
77
Table 4.2 Oblique Rotation Methods (Continued)
Method
SAS PROC FACTOR Equivalent
Oblimin
ROTATE=OBLIMIN, where the default p value is zero, unless
specified otherwise in the TAU= option.
ROTATE=OBLIMIN(p) specifies p as the oblimin weight or the
TAU= value.
Note: For additional information about oblimin weight, see the
SAS documentation, “Simplicity Functions for Rotations.”
Obparsimax
ROTATE=OBPARSIMAX
Obquartimax
ROTATE=OBQUARTIMAX
Obvarimax
ROTATE=OBVARIMAX
Quartimin
ROTATE=OBLIMIN(0) or ROTATE=OBLIMIN with TAU=0
Promax
ROTATE=PROMAX
Factor Analysis Platform Options
The Factor Analysis platform red triangle menu enables you to select to view or hide the
following report elements:
A table that indicates the total number of factors extracted based on the
eigenvalues (that is, the amount of variance contributed by each factor). The table includes
the percent of the total variance contributed by that factor, a bar chart illustrating the
percent contribution, and the cumulative percent contributed by each successive factor.
The number of eigenvalues greater than or equal to 1.0 can be taken as the number of
sufficient factors for analysis.
Eigenvalues
A plot of the eigenvalues versus the number of components (or factors). The plot
can be used to determine the number of factors that contribute to the maximum amount of
variance. The point at which the plotted line levels out can be taken as the number of
sufficient factors for analysis.
Scree Plot
Lists the Script menu options for the platform. See the JMP Platforms chapter in the
Using JMP book for details.
Script
See Figure 4.2 on page 70 for an example.
78
Factor Analysis
Factor Analysis Model Fit Options
Chapter 4
Consumer Research
Factor Analysis Model Fit Options
After submitting the Model Launch, the model results appear. The following options are
available from the Factor Analysis report’s red triangle menu.
Prior Communality An initial estimate of the communality for each variable. For a given
variable, this estimate is the squared multiple correlation coefficient (SMC), or RSquare, for
a regression of that variable on all other variables.
Note: The Prior Communality Estimates table only appears if the Common Factor Analysis
(diagonals = SMC) option is selected.
Figure 4.7 Prior Communality Estimates
Eigenvalues Shows the eigenvalues of the reduced correlation matrix and the percent of the
common variance for which they account. The reduced correlation matrix is the correlation
matrix with its diagonal entries replaced by the communality estimates. The eigenvalues
indicate the common variance explained by the factors. The Cum Percent can exceed 100%
because the reduced correlation matrix is not necessarily positive definite and can have
negative eigenvalues.
Note that the table indicates the number of factors retained for analysis.
The Eigenvalues option is only available when the Prior Communality option is set to
Common Factor Analysis (diagonals = SMC). The communality estimates are the SMC
(square multiple correlation) values.
Figure 4.8 indicates that the first two factors account for 100.731% of the common variance.
This pattern suggests that you may not need more that two factors to model your data.
Figure 4.8 Eigenvalues of the Reduced Correlation Matrix
Chapter 4
Consumer Research
Factor Analysis
Factor Analysis Model Fit Options
79
Unrotated Factor Loading Shows the factor loading matrix before rotation. Factor loadings
measure the influence of a common factor on a variable. Because the unrotated factors are
orthogonal, the factor loading matrix is the matrix of correlations between the variables
and the factors. The closer the absolute value of a loading is to 1, the stronger the effect of
the factor on the variable.
Use the slider and value to Suppress Absolute Loading Values Less Than the specified
value in the table. Suppressed values appear dimmed according to the setting specified by
Dim Text.
Use the Dim Text slider and value to control the table’s font transparency gradient for factor
values less in absolute value than the specified Suppress Absolute Loading Values Less
Than value.
Note: The Suppress Absolute Loading Values Less Than value and Dim Text value are the
same values used in the Rotated Factor Loading table. Changes to one loading table’s settings
changes the settings in the other loading table.
Figure 4.9 Unrotated Factor Loading
Note: The Unrotated Factor Loading matrix is re-ordered so that variables associated with the
same factor appear next to each other.
Rotation Matrix Shows the calculations used for rotating the factor loading plot and the factor
loading matrix.
Figure 4.10 Rotation Matrix
Shows the matrix to which the varimax factor pattern is rotated. This option is
available only for the Promax rotation.
Target Matrix
80
Factor Analysis
Factor Analysis Model Fit Options
Chapter 4
Consumer Research
Figure 4.11 Target Matrix
Factor Structure Shows the matrix of correlations between variables and common factors. If
the common factors are uncorrelated. This option is available only for oblique rotations.
Figure 4.12 Factor Structure
Final Communality Estimates Estimates of the communalities after the factor model has been
fit. When the factors are orthogonal, the final communality estimate for a variable equals
the sum of the squared loadings for that variable.
Figure 4.13 Final Communality Estimates
Standard Score Coefficients Lists the multipliers used to convert factor values when saving
rotated components as factors to the source data table.
Figure 4.14 Standard Score Coefficients
Variance Explained by Each Factor Gives the variance, percent, and cumulative percent, of
common variance explained by each rotated factor.
Chapter 4
Consumer Research
Factor Analysis
Factor Analysis Model Fit Options
81
Figure 4.15 Variance Explained by Each Factor
Significance Test If you select Maximum Likelihood as the factoring method, the results of two
Chi-square tests are provided.
The first test is for H0: No common factors. This null hypothesis indicates that none of the
common factors are sufficient to explain the intercorrelations among the variables. This
test is Bartlett’s Test for Sphericity, whose null hypothesis is that the correlation matrix of
the factors is an identity matrix (Bartlett, 1954).
The second test is for H0: N factors are sufficient, where N is the specified number of
factors. Rejection of this null hypothesis indicates that more factors may be required to
explain the intercorrelations among the variables (Bartlett, 1954).
The tests in Figure 4.16 indicate that the common factors already included in the model
explain some of the intercorrelations, but that more factors are needed.
Note: The Significance Test table only appears if the Maximum Likelihood factoring method
option is selected.
Figure 4.16 Significance Test
Shows the factor loading matrix after rotation. If the rotation is
orthogonal, these values are the correlations between the variables and the rotated factors.
Rotated Factor Loading
Use the slider and value to Suppress Absolute Loading Values Less Than the specified
value in the table. Suppressed values appear dimmed according to the setting specified by
Dim Text.
Use the Dim Text slider and value to control the table’s font transparency gradient for factor
values less in absolute value than the specified Suppress Absolute Loading Values Less
Than value.
Note: The Suppress Absolute Loading Values Less Than value and Dim Text value are the
same values used in the Unrotated Factor Loading table. Changes to one loading table’s
settings changes the settings in the other loading table.
82
Factor Analysis
Factor Analysis Model Fit Options
Chapter 4
Consumer Research
Figure 4.17 Rotated Factor Loading
Note: The Rotated Factor Loading matrix is re-ordered so that variables associated with the
same factor appear next to each other.
Factor Loading Plot The plot of the rotated loading factors.
Figure 4.18 Factor Loading Plot
Note that in the Factor Loading Plot, Factor 1 relates to the
Carbon Tetrachloride-Chloroform-Benzene-Hexane cluster of variables, and Factor 2 relates to
the Ether–1-Octanol cluster of variables. See the matrix of “Rotated Factor Loading” on page 81
for details.
Score Plot The Score Plot graphs each factor’s calculated values in relation to the other
adjusting each value for the mean and standard deviation.
Chapter 4
Consumer Research
Factor Analysis
Factor Analysis Model Fit Options
83
Figure 4.19 Score Plot
Score Plot with Imputation Imputes any missing values and creates a score plot. This option is
available only if there are missing values.
Save Rotated Components Saves the rotated components to the data table, with a formula for
computing the components. The formula cannot evaluate rows with missing values.
Save Rotated Components with Imputation Imputes missing values, and saves the rotated
components to the data table. The column contains a formula for doing the imputation,
and computing the rotated components. This option appears after the Factor Analysis
option is used, and if there are missing values.
Remove Fit Removes the fit model results from the Factor Analysis Fit Model report. This
option enables you to change the Model Launch configuration for a new report.
84
Factor Analysis
Factor Analysis Model Fit Options
Chapter 4
Consumer Research
Chapter 5
Choice Models
Fit Models for Choice Experiments
The Choice platform is designed for use in market research experiments, where the ultimate
goal is to discover the preference structure of consumers. Then, this information is used to
design products or services that have the attributes most desired by consumers.
Features provided in the Choice platform include:
•
Ability to use information about consumer traits as well as product attributes.
•
Integration of data from one, two, or three sources.
•
Ability to use the integrated profiler to understand, visualize, and optimize the response
(utility) surface.
•
Provides subject-level scores for segmenting or clustering your data.
•
Uses a special default bias-corrected maximum likelihood estimator described by Firth
(1993). This method has been shown to produce better estimates and tests than MLEs
without bias correction. In addition, bias-corrected MLEs ameliorate separation problems
that tend to occur in logistic-type models. Refer to Heinze and Schemper (2002) for a
discussion of the separation problem in logistic regression.
The Choice platform is not appropriate to use for fitting models that involve:
•
Ranking or scoring.
•
Nested hierarchical choices. (PROC MDC in SAS/ETS can be used for such analysis.)
Figure 5.1 Choice Platform Example
Contents
Choice Modeling Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Example of the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Launch the Choice Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Choice Model Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Choice Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Valuing Trade-offs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Example of Valuing Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Example of One-Table Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Example of Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Special Data Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Default Choice Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Subject Data with Response Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Transforming Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Example of Transforming Data to Two Analysis Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Example of Transforming Data to One Analysis Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Additional Examples of Logistic Regression Using the Choice Platform. . . . . . . . . . . . . . . . . 129
Example of Logistic Regression Using the Choice Platform. . . . . . . . . . . . . . . . . . . . . . . . . 129
Example of Logistic Regression for Matched Case-Control Studies . . . . . . . . . . . . . . . . . . 132
Statistical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Chapter 5
Consumer Research
Choice Models
Choice Modeling Platform Overview
87
Choice Modeling Platform Overview
Choice modeling, pioneered by McFadden (1974), is a powerful analytic method used to
estimate the probability of individuals making a particular choice from presented alternatives.
Choice modeling is also called conjoint modeling, discrete choice analysis, and conditional
logistic regression.
The Choice Modeling platform uses a form of conditional logistic regression. Unlike simple
logistic regression, choice modeling uses a linear model to model choices based on response
attributes and not solely upon subject characteristics. For example, in logistic regression, the
response might be whether you buy brand A or brand B as a function of ten factors. The
response could also be considered characteristics that describe you such as your age, gender,
income, education, and so on. However, in choice modeling, you might be choosing between
two cars that are a compound of ten attributes such as price, passenger load, number of cup
holders, color, GPS device, gas mileage, anti-theft system, removable-seats, number of safety
features, and insurance cost.
When engineers design a product, they routinely make hundreds or thousands of small
design decisions. Most of these decisions are not tested by prospective customers.
Consequently, these products are not optimally designed. However, if customer testing is not
too costly and test subjects (prospective customers) are readily available, it is worthwhile to
test more of these decisions via consumer choice experiments.
Modeling costs have recently decreased with improved product and process development
techniques and methodologies. Prototyping, including pure digital prototyping, is becoming
less expensive, so it is possible to evaluate the attributes and consequences of more
alternatives. Another important advancement is the use of the Internet to deliver choice
experiments to a wide audience. You can now inform your customers that they can have input
into the design of the next product edition by completing a web survey.
Choice modeling can be added to Six Sigma programs to improve consumer products. Six
Sigma aims at making products better by improving the manufacturing process and ensuring
greater performance and durability. But, Six Sigma programs have not addressed one very
important aspect of product improvement—making the products that people actually want.
Six Sigma programs often consider the Voice of the Customer and can use customer
satisfaction surveys. These surveys can disclose what is wrong with the product, but they fail
to identify consumer preferences with regard to specific product attributes. Choice
experiments provide a tool that enables companies to gain insight for actual customer
preferences. Choice modeling analysis can reveal such preferences.
Market research experiments have a long history of success, but performing these experiments
has been expensive, and research has previously focused on price elasticity and competitive
situations. It is by using these same techniques for product design engineering where choice
modeling can have the most impact.
88
Choice Models
Example of the Choice Platform
Chapter 5
Consumer Research
Example of the Choice Platform
Suppose that you are supplying pizza for an airline. You want to find pizza attributes that are
optimal for the flying population. So, you have a group of frequent flyers complete a choice
survey. To weigh the importance of each attribute and potential interactions between
attributes, you give them a series of choices that require them to state their preference between
each pair of choices. One pair of choices might be between two types of pizza that they like, or
between two types of pizza that they do not like. Hence, the choice might not always be easy.
This example examines pizza choices where three attributes, with two levels each, are
presented to the subjects:
•
crust (thick or thin)
•
cheese (mozzarella or Monterey Jack)
•
topping (pepperoni or none)
Suppose a subject likes thin crust with mozzarella cheese and no topping. The choices given to
the subject are either a thick crust with mozzarella cheese and pepperoni topping, or a thin
crust with Monterey Jack cheese and no topping. Because neither of these pizzas is ideal, the
subject has to weigh which of the attributes are more important.
This example uses three data tables: Pizza Profiles.jmp, Pizza Responses.jmp, and Pizza
Subjects.jmp.
1. Select Help > Sample Data Library and open Pizza Profiles.jmp, Pizza Responses.jmp, and
Pizza Subjects.jmp.
The profile data table, Pizza Profiles.jmp, lists all the pizza choice combinations that you
want to present to the subjects. Each choice combination is given an ID.
For the actual survey or experiment, each subject is given four trials, where each trial
consists of stating his or her preference between two choice profiles (Choice1 and
Choice2). The choice profiles given for each trial are referred to as a choice set. One
subject’s choice trials can be different from another subject’s trials. Refer to the Discrete
Choice Designs chapter in the Design of Experiments Guide. Notice that each choice value
refers to an ID value in the Profile data table that has the attribute information.
Data about the subjects are stored in a separate Pizza Subjects.jmp file. This table includes
a Subject ID column and characteristics of the subject. In the pizza example, the only
characteristic or attribute about the Subject is Gender. Notice that the response choices and
choice sets in the response data table use the ID names given in the Pizza Profiles.jmp data
set. Similarly, the subject identifications in the response data table match those in the
subject data table.
2. Select Analyze > Consumer Research > Choice to open the launch window.
There are three separate sections for each of the data sources.
Chapter 5
Consumer Research
Choice Models
Example of the Choice Platform
89
3. Select Select Data Table under Profile Data.
A new window appears, which prompts you to specify the data table for the profile data.
4. Select Pizza Profiles.jmp.
The columns from this table now populate the field under Select Columns in the Choice
Dialog box.
5. Select ID for Profile ID under Pick Role Variables and Add Crust, Cheese, and Topping under
Construct Model Effects.
Figure 5.2 Profile Data Dialog
6. Open the Response Data section of the window. Click Select Data Table. When the
Response Data Table window appears, select Pizza Responses.jmp.
7. Select Choice for the Profile ID Chosen, and Choice1 and Choice2 for the Profile ID Choices.
90
Choice Models
Example of the Choice Platform
Chapter 5
Consumer Research
Figure 5.3 Response Data Dialog
Choice1 and Choice2 are the profile ID choices given to a subject on each of four trials. The
Choice column contains the chosen preference between Choice1 and Choice2.
8. Open the Subject Data section of the window. Click Select Data Table. When the Subject
Data Table window appears, select Pizza Subjects.jmp.
9. Select Subject for Subject ID to identify individual subjects.
10. Add Gender under Construct Model Effects.
Chapter 5
Consumer Research
Choice Models
Example of the Choice Platform
91
Figure 5.4 Subject Data Dialog
11. Click Run Model.
Figure 5.5 Choice Model Results
Response-only effects
Subject-effect by
Response-effect interaction
Cheese is significant as main effect.
Crust and Topping are significant only
when interacted with Gender.
Figure 5.5 shows the parameter estimates and the likelihood ratio tests for the Choice
Model with subject effects included. Strong interactions are seen between Gender and
Crust and between Gender and Topping. When the Crust and Topping factors are assessed
92
Choice Models
Example of the Choice Platform
Chapter 5
Consumer Research
for the entire population, the effects are not significant. However, the effects of Crust and
Topping are strong when they are evaluated between Gender groups.
Explore the Model
If you are concerned with selecting the best combination of factor levels, the Utility Profiler is
a useful tool to explore and optimize your data.
1. From the Choice Model red triangle menu, select Utility Profiler.
Figure 5.6 Utility Profiler with Female Level Factor Setting
You can see that females do not particularly enjoy the default settings of thick crust with
Jack cheese and pepperoni toppings.
2. Move the red line in the Gender plot to male (M).
Figure 5.7 Utility Profiler with Male Level Factor Setting
You can see that males prefer the default pizza more than females do, but not by much. To
identify the optimal factor level settings, you must find their maximum desirability
settings.
3. Click the red triangle menu of the Utility Profiler and select Desirability Functions.
A new row is added to the Utility Profiler, displaying overall desirability traces and
measures. Utility and Desirability Functions are shown together in Figure 5.8.
Chapter 5
Consumer Research
Choice Models
Launch the Choice Platform
93
Figure 5.8 Utility and Desirability Functions
4. From the red triangle menu of the Utility Profiler, select Maximize for each Grid Point.
A Remembered Settings table containing the grid settings with the maximum utility and
desirability functions, and a table of differences between grids is displayed. See Figure 5.9.
As illustrated, this feature can be a very quick and useful tool for selecting the most
desirable attribute combinations for a factor.
Figure 5.9 Utility and Desirability Settings
The grid setting for females shows that the greatest utility and desirability values are obtained
when the pizza attributes are thin crust, mozzarella cheese, and no topping. For males, the
grid setting shows that the highest utility and desirability values are obtained when the pizza
attributes are thick crust, mozzarella cheese, and pepperoni topping. If you could offer two
pizzas choices, these optimal settings for females and males would be the best to offer.
Launch the Choice Platform
Launch the Choice platform by selecting Analyze > Consumer Research > Choice.
The Choice platform is unique because it is designed to use data from one, two or three
different data tables. Typically, data are in three separate tables: Profile Data, Response Data,
and Subject Data. Each data table contains information that is joined together by the platform
94
Choice Models
Launch the Choice Platform
Chapter 5
Consumer Research
to extract the necessary data for Choice analysis. There are three sections of the Choice Launch
Window. Each section corresponds to a different data table. If all your data are contained in
one table, you can use the Choice platform, but additional effort is necessary. See the section
“One-Table Analysis” on page 112.
Because the Choice platform can use several data tables, no initial assumption is made about
using the current data table—as is the case with other JMP platforms. You must select the data
table for each of the three choice data sources. You are prompted to select the profile data set
and the response data set. If you want to model subject attributes, then a subject data set must
also be selected. You can expand or collapse each section of the Choice window, as needed.
Profile Data
Profile data describe the attributes associated with each choice. Each choice can comprise
many different attributes, and each attribute is listed as a column in the data table. There is a
row for each possible choice, and each possible choice contains a unique ID. Figure 5.10
illustrates an example Profile Data table.
Figure 5.10 Example of Profile Data Table and Dialog Window
Chapter 5
Consumer Research
Choice Models
Launch the Choice Platform
95
Link to profile data table. You can select from any of the data sets already
open in the current JMP session, or you can select Other. Selecting Other enables you to
open a file that is not currently open.
Select Data Table
Identifier for each row of choice combinations. If the Profile ID column does not
uniquely identify each row in the profile data table, you need to add Grouping columns.
Add Grouping columns until the combination of Grouping and Profile ID columns uniquely
identify the row, or profile.
Profile ID
With the Profile ID, unique identifier of each row, or profile. For example, if Profile
ID = 1 for Survey = A, and a different Profile ID = 1 for Survey = B, then Survey would be used
as a Grouping column.
Grouping
For information about the Construct Model Effects window, see the Construct Model Effects
section in the Introduction to Fit Model chapter of the Fitting Linear Models book.
Response Data
Response data contain the experimental results and have the choice set IDs for each trial as
well as the actual choice selected by the subject. Each subject usually has several trials, or
choice sets, to cover several choice possibilities. There can be more than one row of data for
each subject. For example, an experiment might have 100 subjects with each subject making 12
choice decisions, resulting in 1200 rows in this data table. The Response data are linked to the
Profile data through the choice set columns and the actual choice response column. Choice set
refers to the set of alternatives from which the subject makes a choice. Grouping variables are
sometimes used to align choice indices when more than one group is contained within the
data.
96
Choice Models
Launch the Choice Platform
Chapter 5
Consumer Research
Figure 5.11 Example of Response Data Table and Launch Window
Link to response data table. You can select from any of the data sets already
open in the current JMP session, or you can select Other. Selecting Other enables you to open a
file that is not currently open.
Select Data Table
Profile ID Chosen
Grouping
The Profile ID from the Profile data table of the choice the subject selected.
With the Profile ID Choices, unique identifier of each choice profile.
Profile ID Choices
The Profile IDs of the set of possible choices.
Subject ID The Subject ID from the Subject data table.
The frequency of the observations. If n is the value of the Freq variable for a given row,
then that row is used in computations n times. If it is less than 1 or missing, then JMP does
not use it to calculate any analyses.
Freq
Chapter 5
Consumer Research
Choice Models
Launch the Choice Platform
97
Weight The weights for each observation in the data table. The variable does not have to be
an integer, but it is included only in analyses when its value is greater than zero.
By
Levels in the data to create separate analyses.
Subject Data
Subject data are optional, depending on whether subject effects are to be modeled. This source
contains one or more attributes or characteristics of each subject and a subject identifier. The
Subject data table contains the same number of rows as subjects and has an identifier column
that matches a similar column in the Response data table. You can also put Subject data in the
Response data table, but it is still specified as a subject table.
Figure 5.12 Example of Subject Data Table and Launch Window
Select Data Table Link to subject data table. You can select from any of the data sets already
open in the current JMP session, or you can select Other. Selecting Other enables you to open a
file that is not currently open.
Subject ID Unique identifier of the subject.
For information about the Construct Model Effects window, see the Construct Model Effects
section in the Introduction to Fit Model chapter of the Fitting Linear Models book.
98
Choice Models
Launch the Choice Platform
Chapter 5
Consumer Research
Scripting the Choice Platform
If you are scripting the Choice platform, you can also set the acceptable criterion for
convergence when estimating the parameters by adding this command to the Choice()
specification:
Choice( ..., Convergence Criterion( fraction ), ... )
See the JMP Scripting Index in the Help menu for an example.
Choice Model Output
An example of the Choice Model Output results is shown in Figure 5.13.
•
The resulting parameter estimates are sometimes referred to as part-worths. Each
part-worth is the coefficient of utility associated with that attribute. By default, these
estimates are based on the Firth bias-corrected maximum likelihood estimators and
therefore are considered to be more accurate than MLEs without bias correction.
•
Comparison criteria are used to help determine the better-fitting model(s) when more than
one model is investigated for your data. The model with the lower or lowest criterion
value is believed to be the better or best model. Criteria are shown in the Choice Model
output and include AICc (corrected Akaike’s Information Criterion), BIC (Bayesian
Information Criterion), − 2 LogLikelihood, and − 2 Firth Loglikelihood. The AICc formula
is:
2k ( k + 1 )
AICc = – 2loglikelihood + 2k + ----------------------n–k–1
where k is the number of estimated parameters in the model and n is the number of
observations in the data set. The BIC formula is: − 2 LogLikelihood + k ∗ ln(n), where k
parameters is fitted to data with n observations and LogLikelihood is the maximized
log-likelihood. Note that the − 2 Firth Loglikelihood result is included only in the report
when the Firth Bias-adjusted Estimates check box is checked in the launch window. (See
Figure 5.10.) This option is checked by default. The decision to use or not use the Firth
Bias-adjusted Estimates does not affect the AICc score or the − 2 LogLikelihood results.
•
Likelihood ratio tests appear for each effect in the model. These results are obtained by
default if the model is fit quickly (less than five seconds). Otherwise, you can select the
Choice Model red triangle menu and select Likelihood Ratio Tests.
Chapter 5
Consumer Research
Choice Models
Choice Platform Options
99
Figure 5.13 Choice Model Results with No Subject Data for Pizza Example
Part-worths for crust, cheese,
and topping
Choice Platform Options
The Choice Modeling platform has many available options. To access these options, select the
Choice Model red triangle menu.
Tests the significance of each effect in the model. These are done by
default if the estimate of CPU time is less than five seconds.
Likelihood Ratio Tests
Tests each factor in the model by constructing a likelihood ratio test for all
the effects involving that factor. For more information about Joint Factor Tests, see the
Standard Least Squares Report and Options chapter of the Fitting Linear Models.
Joint Factor Tests
Confidence Intervals Produces a 95% confidence interval for each parameter (by default),
using the profile-likelihood method. Shift-click the platform red triangle menu and select
Confidence Intervals to input alpha values other than 0.05.
Correlation of Estimates Shows the correlations of the parameter estimates.
Shows the fitted utility values for different levels in the effects, with neutral
values used for unrelated factors. Effect Marginals also includes marginal probabilities.
The marginal probability is the probability of an individual selecting attribute factor A
over factor B with all other attributes at their mean or default levels. For example, the
marginal probability of any subject choosing a pizza with mozzarella cheese, thick crust
and pepperoni over that same pizza with Monterey Jack cheese instead of mozzarella is
0.9470.
Effect Marginals
100
Choice Models
Choice Platform Options
Chapter 5
Consumer Research
Figure 5.14 Example of Marginal Effects
Utility Profiler Shows the predicted utility for different factor settings. The utility is the value
predicted by the linear model. See “Explore the Model” on page 92 for an example of the
Utility Profiler. For details about the Utility Profiler options, see the Prediction Profiler
Options section in the Profiler chapter of the Profilers book.
Probability Profiler Shows the predicted probability for choosing the given factor settings
compared with baseline factor settings. This predicted probability is defined as
( exp ( U ) ) ⁄ ( exp ( U ) + exp ( U b ) ) , where U is the utility for the current settings and Ub is the
utility for the baseline settings. See “Compare to Baseline” on page 108 for an example of
using the Probability Profiler. For details about the Probability Profiler options, see the
Prediction Profiler Options section in the Profiler chapter of the Profilers book.
Multiple Choice Profiler Enables you to set up a number of choice sets and see the
probabilities of choosing each set relative to the other choices. See “Multiple Choice
Comparison” on page 110 for an example of using the Multiple Choice Profiler. For details
about the Multiple Choice Profiler options, see the Prediction Profiler Options section in
the Profiler chapter of the Profilers book.
Performs comparisons between specific alternative choice profiles. Enables you
to select factor values and the values that you want to compare. From here you can
compare specific configurations, including comparing all settings on the left or right by
selecting the Any check boxes. Using Any does not compare all combinations across
features, but rather all combinations of comparisons, one feature at a time, using the left
settings as the settings for the other factors.
Comparisons
Chapter 5
Consumer Research
Choice Models
Choice Platform Options
101
Figure 5.15 Comparisons Example
Calculates how much a price must change allowing for the new feature
settings to produce the same predicted outcome. The result is calculated using the Baseline
settings (for each background setting) and then determining the outcome after altering the
Role, including.
Willingness to Pay
‒ Feature Factor - a feature in the experiment that you want to price.
‒ Price Factor - a continuous price factor in the experiment.
‒ Background Constant - something that you want to hold constant at a baseline value.
‒ Background Variable - something that you want to iterate across values.
Figure 5.16 Willingness to Pay Example
The Include baseline settings in report table option adds the baseline settings with a price
change of zero, which is useful if you make an output table of these prices displaying all
the baseline settings as well as the featured settings.
102
Choice Models
Valuing Trade-offs
Chapter 5
Consumer Research
Save Utility Formula Makes a new column with a formula for the utility, or linear model, that
is estimated. This is in the profile data table, except if there are subject effects. In that case,
it makes a new data table for the formula. This formula can be used with various profilers
with subsequent analyses. For more details about the Utility Formula, see “Statistical
Details” on page 134.
Constructs a new table that has a row for each subject containing
the average (Hessian-scaled-gradient) steps on each parameter. This corresponds to using
a Lagrangian multiplier test for separating that subject from the remaining subjects. These
values can later be clustered, using the built-in-script, to indicate unique market segments
represented in the data. For more details, see “Segmentation” on page 115.
Save Gradients by Subject
Model Dialog Shows the Choice launch window, which can be used to modify and re-fit the
model. You can specify new data sets, new IDs, and new model effects.
Valuing Trade-offs
The Choice Modeling platform is also useful for determining the relative importance of
product attributes. Even if the attributes of a particular product that are important to the
consumer are known, information about preference trade-offs with regard to these attributes
might be unknown. By gaining such information, a market researcher or product designer is
able to incorporate product features that represent the optimal trade-off from the perspective
of the consumer. The advantages of this approach to product design can be found in the
following example.
Example of Valuing Trade-offs
It is already known that four attributes are important for laptop design: hard-disk size,
processor speed, battery life, and selling price. The data gathered for this study are used to
determine which of four laptop attributes (Hard Disk, Speed, Battery Life, and Price) are most
important. It also assesses whether there are Gender or Job differences seen with these
attributes.
1. Select Help > Sample Data Library and open Laptop Profile.jmp.
2. Select Analyze > Consumer Research > Choice to open the launch window.
3. Click Select Data Table under Profile Data and select Laptop Profile.jmp. A partial listing of
the Profile Data table is shown in Figure 5.17. The complete data set consists of 24 rows, 12
for Survey 1 and 12 for Survey 2. Survey and Choice Set define the grouping columns and
Choice ID represents the four attributes of laptops: Hard Disk, Speed, Battery Life, and Price.
Chapter 5
Consumer Research
Choice Models
Valuing Trade-offs
103
Figure 5.17 Profile Data Set for the Laptop Example
4. Select Choice ID for Profile ID, and Add Hard Disk, Speed, Battery Life, and Price for the
model effects.
5. Select Survey and Choice Set as the Grouping columns. The Profile Data window is shown
in Figure 5.18.
Figure 5.18 Profile Data Dialog Box for Laptop Study
6. Click Response Data > Select Data Table > Other > OK and select Laptop Runs.jmp from the
sample data library.
7. Select Response as the Profile ID Chosen, Choice1, and Choice2 as the Profile ID Choices,
Survey and Choice Set as the Grouping columns, and Person as Subject ID. The Response
Data window is shown in Figure 5.19.
104
Choice Models
Valuing Trade-offs
Figure 5.19 Response Data Dialog Box for Laptop Study
8. To run the model without subject effects, click Run Model.
9. Choose Utility Profiler from the red triangle menu.
Chapter 5
Consumer Research
Chapter 5
Consumer Research
Choice Models
Valuing Trade-offs
105
Figure 5.20 Laptop Results without Subject Effects
Results of this study show that while all the factors are important, the most important factor in
the laptop study is Hard Disk. The respondents prefer the larger size. Note that respondents
did not think a price increase from $1000 to $1200 was important, but an increase from $1200
to $1500 was considered important. This effect is easily visualized by examining the factors
interactively with the Utility Profiler. Such a finding can have implications for pricing policies,
depending on external market forces.
Subject Effects
To include subject effect for the laptop study, simply add to the Choice Modeling window.
1. Select Model Dialog from the Choice Model red triangle menu.
2. Under Subject Data, Select Data Table > Other > OK > Laptop Subjects.jmp.
3. Select Person as Subject ID and Add Gender and Job as the model effects.
The Subject Data window is shown in Figure 5.21.
106
Choice Models
Valuing Trade-offs
Figure 5.21 Subject Dialog Box for Laptop Study
4. Click Run Model.
Results are shown in Figure 5.22, Figure 5.23, and Figure 5.26.
Figure 5.22 Laptop Parameter Estimate Results with Subject Data
Chapter 5
Consumer Research
Chapter 5
Consumer Research
Choice Models
Valuing Trade-offs
107
Figure 5.23 Laptop Likelihood Ratio Test Results with Subject Data
5. Select Joint Factor Tests from the Choice Model red triangle menu
The results of the Joint Factor Tests are displayed in Figure 5.24.
Figure 5.24 Joint Factor Test for Laptop
6. Select Effect Marginals from the Choice Model red triangle menu.
The marginal effects of each level for each factor are displayed in Figure 5.25. Notice that
the marginal effects for each factor across all levels sum to zero.
Figure 5.25 Marginal Effects for Laptop
108
Choice Models
Valuing Trade-offs
Chapter 5
Consumer Research
Figure 5.26 Laptop Profiler Results for Females with Subject Data
Figure 5.27 Laptop Profiler Results for Males with Subject Data
The interaction effect between Gender and Hard Disk is marginally significant, with a p-value
of 0.0744 (See Figure 5.23 on page 107). In the Utility Profiler, check the slope for Hard Disk for
both levels of Gender. You see that the slope is steeper for females than for males.
Compare to Baseline
If you wanted to know the probability of a customer selecting one choice over some baseline,
the Probability Profiler is useful to compare choices. Suppose you were developing a new
product and wanted to see the likelihood of a customer selecting the new product over the old
product or a competitor’s product. The Probability Profiler is a useful tool for this type of
baseline analysis.
In this example, your company is currently producing laptops with 40 GB hard drives, 1.5
GHz processors, and 6 hour battery life that costs $1,200. You are looking for a way to make
your product more desirable by changing as few factors as possible.
1. From the Choice Model red triangle window, select Probability Profiler.
2. Set the Baseline to 40 GB, 1.5 GHz, 6 hours, and $1,200.
3. In the Probability Profiler, change the settings of Battery Life to 6 hours and Price to $1,200.
Chapter 5
Consumer Research
Choice Models
Valuing Trade-offs
109
Figure 5.28 Laptop Probability Profiler Results with Baseline Effects
You can see in Figure 5.28 that by increasing Hard Disk space, increasing Speed, or
decreasing the price would increase the probability a subject would choose the new
product over the old for females with development jobs.
4. Change the Gender effect in the Baseline to M.
You can see that by decreasing price, males with development jobs are actually less likely
to choose the new product over the old. If you change the Job effect in the Baseline to
Marketing, you see the same effect.
Looking at Figure 5.28, you can see that the steepest slope is for the Hard Disk factor.
5. Change the Gender Effect in the Baseline to F and change the Hard Disk to 80 GB in the
Probability Profiler.
Figure 5.29 Laptop Probability Profiler Results with More Space
By increasing the space in the new laptop, a female with a development job choose the
new laptop over the old laptop almost 92% of the time.
You can change the subject effects to vary gender and job. The probability results are
displayed in Table 5.1. You can see that males in development jobs are less likely than
females in the same job to choose the new laptop, they are still more likely to choose the
110
Choice Models
Valuing Trade-offs
Chapter 5
Consumer Research
new laptop over the old. Developing a new product with increased hard disk space is
more desirable to your customers than the old product.
Table 5.1 Laptop Probability Profiler Probabilities
Gender
Job
Probability of Selecting New
Laptop over Old
Female
Development
0.915746
Female
Marketing
0.938172
Male
Development
0.696043
Male
Marketing
0.761733
Multiple Choice Comparison
If, for example, you wanted to develop a product that had an advantage over two other
competitors, you could use the Multiple Choice Profiler. The Multiple Choice Profiler enables
you to determine the probability of a subject choosing one choice set over various others.
Company A produces a product with a fast processor speed and high battery life at a
reasonable price. Company B makes the biggest hard drives with the fastest speed, but at a
high price and low battery life. You currently make a low-end laptop with small hard drive,
slow processor, and low battery life. You want to gain market share by increasing one area of
performance and price.
You want to market this product to the masses, so you want to remove the subject effects.
1. From the Choice Model red triangle window, select Model Dialog.
2. Open the Subject Data section. Click Subject Data Table.
The Subject DataTable window appears, prompting you to select a data table to use.
3. Ensure that no data table is selected and click OK. By default, no data table is selected.
4. Click Run Model.
5. From the Choice Model red triangle window, select Multiple Choice Profiler.
A window appears, asking for the number of alternative choices to profile. The default
number 3 is already entered in the window.
6. Click OK.
Three Alternative profilers appear. Each factor in each profiler is set to their default values.
Alternative 1 indicates the product that you want to develop. Alternative 2 indicates
Company A’s product. Alternative 3 indicates Company B’s product.
7. For Alternative 2, set Hard Disk to 40 GB, Speed to 2.0 GHz, Battery Life to 6 hours, and
Price to $1,200.
Chapter 5
Consumer Research
Choice Models
Valuing Trade-offs
111
8. For Alternative 3, set Hard Disk to 80 GB, Speed to 2.0 GHz, Battery Life to 4 hours, and
Price to $1,500.
Figure 5.30 Multiple Choice Profiler with Laptop Data
You can see that with your company’s settings set to the low, default levels, Company B
has the greatest Share of 0.51364. This Share means that the probability of any subject
choosing Alternative 3 is 51.4%. It is obvious that with your company’s settings set how
they are, very few people buy your product.
112
Choice Models
One-Table Analysis
Chapter 5
Consumer Research
You want to increase your market share by upgrading your company’s laptop in one of the
performance areas and increasing price. The slope of the line in Alternative 1’s Hard Disk
profile suggests increasing hard disk space will increase market share the most.
9. For Alternative, set Hard Disk to 80 GB and Price to $1,200.
Figure 5.31 Multiple Choice Profiler with Improved Laptop
By increasing hard disk space, you can increase the price of your laptop and expect a
market share of about 36%. This share is almost the same as Company B’s
high-performance laptop and is much better than the market share with the initial
low-end settings seen in Figure 5.30.
You can see in Figure 5.31 that you could increase your market share even more by
increasing Speed or Battery Life. If you wanted to determine what laptop configuration
would give you the maximum market share, you could maximize desirability using the
Choice Model red triangle menu options. In this case, however, it is fairly obvious that a
high-performance laptop at a low price would be the most desirable.
One-Table Analysis
The Choice Modeling platform can also be used if all of your data are in one table. For this
one-table scenario, you use only the Profile Data section of the Choice Dialog box.
Subject-specific terms can be used in the model, but not as main effects. Two advantages, both
offering more model-effect flexibility than the three-table specification, are realized by using a
one-table analysis:
•
Interactions can be selectively chosen instead of automatically getting all possible
interactions between subject and profile effects as seen when using three tables.
•
Unusual combinations of choice sets are allowed. This means, for example, that the first
trial can have a choice set of two, the second trial can consist of a choice set of three, the
third trial can have a choice set of five, and so on. With multiple tables, in contrast, it is
assumed that the number of choices for each trial is fixed.
Chapter 5
Consumer Research
Choice Models
One-Table Analysis
113
A choice response consists of a set of rows, uniquely identified by the Grouping columns. An
indicator column is specified for Profile ID in the Choice Dialog box. This indicator variable
uses the value of 1 for the chosen profile row and 0 elsewhere. There must be exactly one “1”
for each Grouping combination.
Example of One-Table Analysis
This example illustrates how the pizza data are organized for the one-table situation.
Figure 5.32 shows a subset of the combined pizza data.
1. Select Help > Sample Data Library and open Pizza Combined.jmp to see the complete table.
Each subject completes four choice sets, with each choice set or trial consisting of two
choices. For this example, each subject has eight rows in the data set. The indicator
variable specifies the chosen profile for each choice set. The columns Subject and Trial
together identify the choice set, so they are the Grouping columns.
Figure 5.32 Partial Listing of Combined Pizza Data for One-Table Analysis
To analyze the data in this format, open the Profile Data section in the Choice Dialog box,
shown in Figure 5.33.
2. Select Analyze > Consumer Research > Choice.
3. Click Select Data Table.
4. Specify Pizza Combined.jmp as the data set. Click OK.
5. In the Profile Data section of the window, specify Indicator as the Profile ID, Subject and
Trial as the Grouping variables, and add Crust, Cheese, and Topping as the main effects.
See Figure 5.33.
114
Choice Models
One-Table Analysis
Chapter 5
Consumer Research
Figure 5.33 Choice Dialog Box for Pizza Data One-Table Analysis
6. Click Run Model.
A new window appears, asking if this is a one-table analysis with all of the data in the
Profile Table.
7. Click Yes to fit the model, as shown in Figure 5.34.
Figure 5.34 Choice Model for Pizza Data One-Table Analysis
8. Select Utility Profiler from the drop-down menu to obtain the results shown in Figure 5.35.
Notice that the parameter estimates and the likelihood ratio test results are identical to the
results obtained for the Choice Model with only two tables, shown in Figure 5.13.
Chapter 5
Consumer Research
Choice Models
Segmentation
115
Figure 5.35 Utility Profiler for Pizza Data One-Table Analysis
Segmentation
Market researchers sometimes want to analyze the preference structure for each subject
separately in order to see whether there are groups of subjects that behave differently.
However, there are usually not enough data to do this with ordinary estimates. If there are
sufficient data, you can specify “By groups” in the Response Data or you could introduce a
Subject identifier as a subject-side model term. This approach, however, is costly if the number
of subjects is large. Other segmentation techniques discussed in the literature include
Bayesian and mixture methods.
You can also use JMP to segment by clustering subjects using response data. For example,
after running the model using the Pizza Profiles.jmp, Pizza Responses.jmp, and the optional
Pizza Subjects.jmp data sets, select the red triangle menu for the Choice Model platform and
select Save Gradients by Subject. A new data table is created containing the average
Hessian-scaled gradient on each parameter, and there is one row for each subject.
Note: This feature is regarded as an experimental method, because, in practice, little research
has been conducted on its effectiveness.
These gradient values are the subject-aggregated Newton-Raphson steps from the
optimization used to produce the estimates. At the estimates, the total gradient is zero, and
Δ = H-1g = 0, where g is the total gradient of the log-likelihood evaluated at the MLE, and H-1 is
the inverse Hessian function or the inverse of the negative of the second partial derivative of
the log-likelihood.
But, the disaggregation of Δ results in
Δ = ΣijΔij = ΣH-1gij = 0,
where i is the subject index, j is the choice response index for each subject, Δij are the partial
Newton-Raphson steps for each run, and gij is the gradient of the log-likelihood by run.
The mean gradient step for each subject is then calculated as:
116
Choice Models
Segmentation
Chapter 5
Consumer Research
Δ ij
Δ i = Σ j ------- ,
ni
where ni is the number of runs per subject. These Δi are related to the force that subject i is
applying to the parameters. If groups of subjects have truly different preference structures,
these forces are strong, and they can be used to cluster the subjects. The Δi are the gradient
forces that are saved. You can then cluster these values using the Clustering platform.
Example of Segmentation
Set Up the Model
1. Select Help > Sample Data Library and open Pizza Profiles.jmp, Pizza Responses.jmp, and
Pizza Subjects.jmp.
2. Select Analyze > Consumer Research > Choice to open the launch window.
There are three separate sections for each of the data sources.
3. Select Select Data Table under Profile Data.
You are prompted to select the data table for the profile data.
4. Select Pizza Profiles.jmp.
The columns from this table now populate the field under Select Columns in the Choice
Dialog box.
5. Select ID and click Profile ID.
6. Select Crust, Cheese, and Topping and click Add.
7. Open the Response Data section of the window and click Select Data Table.
8. Select Pizza Responses.jmp.
9. Select Choice and click Profile ID Chosen
10. Select Choice1 and Choice2 and click Profile ID Choices.
11. Select Subject and click Subject ID.
12. Open the Subject Data section of the window and click Select Data Table.
13. Select Pizza Subjects.jmp.
14. Select Subject and click Subject ID to identify individual subjects.
15. Select Gender and click Add.
16. Click Run Model.
The Choice Model report appears.
Chapter 5
Consumer Research
Choice Models
Segmentation
117
Cluster Subjects
1. In the Choice Model red triangle menu, select Save Gradients by Subject.
A new data table is generated, with gradient forces saved for each main effect and subject
interaction. A partial data table with these subject gradient forces is shown in Figure 5.36.
Figure 5.36 Gradients by Subject for Pizza Data
2. In the new data table, click the red triangle menu of the HierarchicalCluster script. Click
Run Script.
The resulting dendrogram of the clusters is shown in Figure 5.37.
118
Choice Models
Segmentation
Chapter 5
Consumer Research
Figure 5.37 Dendrogram of Subject Clusters for Pizza Data
3. Save the cluster IDs by clicking on the red triangle menu of Hierarchical Clustering and
selecting Save Clusters.
A new column called Cluster is created in the data table containing the gradients. Each
subject has been assigned a Cluster value that is associated with other subjects having
similar gradient forces. Refer to the Cluster platform chapter in the Multivariate Methods
book for a discussion of other Hierarchical Clustering options. The gradient columns can
be deleted because they were used only to obtain the clusters.
4. Select all columns except Subject and Cluster. Right-click on the selected columns and
select Delete Columns.
The data table shown in Figure 5.38 contains only Subject and Cluster variables.
Chapter 5
Consumer Research
Choice Models
Segmentation
119
Figure 5.38 Merge Results Back into Original Table
5. Click Run Script under the Merge Data Back red triangle menu, as shown in the partial
gradient-by-subject table in Figure 5.36.
The cluster information is merged into the Subject data table.
The columns in the Subject data table are now Subject, Gender, and Cluster, as shown in
Figure 5.39.
Figure 5.39 Subject Data with Cluster Column
This table can then be used for further analysis.
Fit Y by X
1. Select Analyze > Fit Y by X.
2. Specify Gender as the Y, Response, and Cluster as X, Factor.
This analysis is depicted in Figure 5.40.
120
Choice Models
Segmentation
Chapter 5
Consumer Research
Figure 5.40 Contingency Analysis of Gender by Cluster for Pizza Example
Figure 5.40 shows that Cluster 1 contains half male and female, Cluster 2 is only female, and
Cluster 3 is all male. If desired, you could now refit and analyze the model with the addition
of the Cluster variable.
Chapter 5
Consumer Research
Choice Models
Special Data Rules
121
Special Data Rules
Default Choice Set
If in every trial, you can choose any of the response profiles, you can omit the Profile ID
Choices selection under Pick Role Variables in the Response Data section of the Choice Dialog
Box. The Choice Model platform then assumes that all choice profiles are available on each
run.
Subject Data with Response Data
If you have subject data in the Response data table, select this table as the Select Data Table
under the Subject Data. In this case, a Subject ID column does not need to be specified. In fact,
it is not used. It is generally assumed that the subject data repeats consistently in multiple runs
for each subject.
Logistic Regression
Ordinary logistic regression can be done with the Choice Modeling platform.
Note: The Fit Y by X and Fit Model platforms are more convenient to use than the Choice
Modeling platform for logistic regression modeling. This section is used only to demonstrate
that the Choice Modeling platform can be used for logistic regression, if desired.
If your data are already in the choice-model format, you might want to use the steps given
below for logistic regression analysis. However, three steps are needed:
•
Create a trivial Profile data table with a row for each response level.
•
Put the explanatory variables into the Response data.
•
Specify the Response data table, again, for the Subject data table.
For examples of conducting Logistic Regression using the Choice Platform, see“Example of
Logistic Regression Using the Choice Platform” on page 129 and “Example of Logistic
Regression for Matched Case-Control Studies” on page 132.
Transforming Data
Although data are often in the Response/Profile/Subject form, the data are sometimes
specified in another format that must be manipulated into the normalized form needed for
choice analysis.
122
Choice Models
Transforming Data
Chapter 5
Consumer Research
Example of Transforming Data to Two Analysis Tables
Consider the data from Daganzo, found in Daganzo Trip.jmp. This data set contains the travel
time for three transportation alternatives and the preferred transportation alternative for each
subject.
Add Choice Mode and Subjects
1. Select Help > Sample Data Library and open the Daganzo.jmp data table.
A partial listing of the data set is shown in Figure 5.41.
Figure 5.41 Partial Daganzo Travel Time Table for Three Alternatives
Each Choice number listed must first be converted to one of the travel mode names. This
transformation is easily done by using the Choose function in the formula editor, as
follows.
2. Select Cols > New Column.
3. Specify the Column Name as Choice Mode and the modeling type as Nominal.
4. Click the Column Properties and select Formula.
5. Click Conditional under the Functions (grouped) command, select Choose, and press the
comma key twice to obtain additional arguments for the function.
6. Click Choice for the Choose expression (expr), and double click each clause entry box to
enter “Subway”, “Bus”, and “Car ” (with the quotation marks) as shown in Figure 5.42.
Figure 5.42 Choose Function for Choice Mode Column of Daganzo Data
7. Click OK in the Formula Editor window.
8. Click OK in the New Column window.
The new Choice Mode column appears in the data table. Because each row contains a
choice made by each subject, another column containing a sequence of numbers should be
created to identify the subjects.
9. Select Cols > New Column.
Chapter 5
Consumer Research
Choice Models
Transforming Data
123
10. Specify the Column Name as Subject.
11. Click Missing/Empty next to Initialize Data and select Sequence Data.
12. Click OK.
A partial listing of the modified table is shown in Figure 5.43.
Figure 5.43 Daganzo Data with New Choice Mode and Subject Columns
Stack the Data
In order to construct the Profile data, each alternative needs to be expressed in a separate row.
1. Selecting Tables > Stack.
2. Select the columns Subway, Bus, and Car and click Stack Columns.
3. For the Output table name, type Stacked Daganzo. Type Travel Time for the Stacked Data
Column and Mode for the Source Label Column.
The resulting Stack window is shown in Figure 5.44.
Figure 5.44 Stack Operation for Daganzo Data
4. Click OK.
A partial view of the resulting table is shown in Figure 5.45.
124
Choice Models
Transforming Data
Chapter 5
Consumer Research
Figure 5.45 Partial Stacked Daganzo Table
Make the Profile Data Table
For the Profile Data Table, you need the Subject, Mode, and Travel Time columns.
1. Select the Subject, Mode, and Travel Time columns and select Tables > Subset.
2. Select Selected Columns and click OK.
A partial data table is shown in Figure 5.46. Note the default table name is Subset of
Stacked Daganzo.
Figure 5.46 Partial Subset Table of Stacked Daganzo Data
Make the Response Data Table
For the Response Data Table, you need the Subject and Choice Mode columns, but you also
need a column for each possible choice.
3. With the original data table open, select the Subject and Choice Mode columns.
4. Select Tables > Subset.
5. Select Selected Columns and click OK.
Note the default table name is Subset of Daganzo Trip.
6. Select Cols > Add Multiple Columns.
7. For the Column prefix, type Choice. Type 3 next to How many columns to add. Click
Numeric and select Character.
8. Click OK.
The columns Choice 1, Choice 2, and Choice 3 have been added.
Chapter 5
Consumer Research
Choice Models
Transforming Data
125
9. Type “Bus” (without quotation marks) in the first row of Choice 1. Right-click the cell and
select Fill > Fill to end of table.
10. Type “Subway” (without quotation marks) in the first row of Choice 2. Right-click the cell
and select Fill > Fill to end of table.
11. Type “Car” (without quotation marks) in the first row of Choice 3. Right-click the cell and
select Fill > Fill to end of table.
The resulting table is shown in Figure 5.47.
Figure 5.47 Partial Subset Table of Daganzo Data with Choice Set
Fit the Model
Now that you have separated the original Daganzo Trip.jmp table into two separate tables, you
can run the Choice Platform.
1. Select Analyze > Consumer Research > Choice.
2. Specify the model, as shown in Figure 5.48.
126
Choice Models
Transforming Data
Chapter 5
Consumer Research
Figure 5.48 Choice Dialog Box for Subset of Daganzo Data
3. Click Run Model.
The resulting parameter estimate now expresses the utility coefficient for Travel Time and
is shown in Figure 5.49.
Chapter 5
Consumer Research
Choice Models
Transforming Data
127
Figure 5.49 Parameter Estimate for Travel Time of Daganzo Data
The negative coefficient implies that increased travel time has a negative effect on consumer
utility or satisfaction. The likelihood ratio test result indicates that the Choice model with the
effect of Travel Time is significant.
Example of Transforming Data to One Analysis Table
Rather than creating two or three tables, it can be more practical to transform the data so that
only one table is used. For the one-table format, the subject effect is added as in the previous
example. A response indicator column is added instead of using three different columns for
the choice sets (Choice 1, Choice 2, Choice 3). The transformation for the one-table scenario
includes the following steps.
1. Create or open Stacked Daganzo.jmp from the “Stack the Data” steps shown in “Example
of Transforming Data to Two Analysis Tables” on page 122.
2. Select Cols > New Column.
3. Type Response as the Column Name.
4. Click Column Properties and select Formula.
5. Select Conditional under the Functions (grouped) command and select If from the formula
editor.
6. Select the column Choice Mode for the expression (expr).
7. Enter “=” and select Mode.
8. Type 1 for the Then Clause and 0 for the Else Clause.
9. Click OK in the Formula Editor window. Click OK in the New Column window.
The completed formula should look like Figure 5.50.
128
Choice Models
Transforming Data
Chapter 5
Consumer Research
Figure 5.50 Formula for Response Indicator for Stacked Daganzo Data
10. Select the Subject, Travel Time, and Response columns and then select Tables > Subset.
11. Select Selected Columns and click OK.
A partial listing of the new data table is shown in Figure 5.51.
Figure 5.51 Partial Table of Stacked Daganzo Data Subset
12. Select Analyze > Consumer Research > Choice to open the launch window and specify the
model as shown in Figure 5.52.
Figure 5.52 Choice Dialog Box for Subset of Stacked Daganzo Data for One-Table Analysis
13. Click Run Model.
Chapter 5
Consumer Research
Choice Models
Additional Examples of Logistic Regression Using the Choice Platform
129
A pop-up dialog window asks whether this is a one-table analysis with all the data in the
Profile Table.
14. Select Yes to obtain the parameter estimate expressing the utility Travel Time coefficient,
shown in Figure 5.53.
Figure 5.53 Parameter Estimate for Travel Time of Daganzo Data from One-Table Analysis
Notice that the result is identical to that obtained for the two-table model, shown earlier in
Figure 5.49.
This chapter illustrates the use of the Choice Modeling platform with simple examples. This
platform can also be used for more complex models, such as those involving more
complicated transformations and interaction terms.
Additional Examples of Logistic Regression Using the Choice
Platform
Example of Logistic Regression Using the Choice Platform
Use the Choice Platform
1. Select Help > Sample Data Library and open Lung Cancer Responses.jmp.
Notice this data table has only one column (Lung Cancer) with two rows (Cancer and
NoCancer).
2. Select Analyze > Consumer Research > Choice > Select Data Table > Lung Cancer
Responses.jmp > OK.
3. Select Lung Cancer as the Profile ID and Add Lung Cancer as the model effect. The Profile
Data window is shown in Figure 5.54.
130
Choice Models
Additional Examples of Logistic Regression Using the Choice Platform
Chapter 5
Consumer Research
Figure 5.54 Profile Data for Lung Cancer Example
4. Click the disclosure icon for Response Data > Select Data Table > Other > OK.
5. Open the sample data set Lung Cancer Choice.jmp.
6. Select Lung Cancer for Profile ID Chosen, Choice1 and Choice2 for Profile ID Choices, and
Count for Freq. The Response Data launch window is shown in Figure 5.55.
Figure 5.55 Response Data for Lung Cancer Example
7. Click the disclosure icon for Subject Data > Select Data Table > Lung Cancer Choice.jmp >
OK.
Chapter 5
Consumer Research
Choice Models
Additional Examples of Logistic Regression Using the Choice Platform
131
8. Add Smoker as the model effect. The Subject Data launch window is shown in Figure 5.56.
Figure 5.56 Subject Data for Lung Cancer Example
9. Uncheck Firth Bias-adjusted Estimates and Run Model.
Choice Modeling results are shown in Figure 5.57.
Figure 5.57 Choice Modeling Logistic Regression Results for the Cancer Data
Use the Fit Model Platform
1. Select Help > Sample Data Library and open Lung Cancer.jmp.
2. Select Analyze > Fit Model.
Automatic specification of the columns is: Lung Cancer for Y, Count for Freq, and Smoker
for Add under Construct Model Effects. The Nominal Logistic personality is automatically
selected.
3. Click Run.
The nominal logistic fit for the data is shown in Figure 5.58.
132
Choice Models
Additional Examples of Logistic Regression Using the Choice Platform
Chapter 5
Consumer Research
Figure 5.58 Fit Model Nominal Logistic Regression Results for the Cancer Data
Notice that the likelihood ratio chi-square test for Smoker*Lung Cancer in the Choice model
matches the likelihood ratio chi-square test for Smoker in the Logistic model. The reports
shown in Figure 5.57 and Figure 5.58 support the conclusion that smoking has a strong effect
on developing lung cancer. See the Logistic Regression chapter in the Fitting Linear Models
book for more details.
Example of Logistic Regression for Matched Case-Control Studies
This section provides an example using the Choice platform to perform logistic regression on
the results of a study of endometrial cancer with 63 matched pairs. The data are from the Los
Angeles Study of the Endometrial Cancer Data in Breslow and Day (1980) and the SAS/
STAT(R) 9.2 User's Guide, Second Edition (2006). The goal of the case-control analysis was to
determine the relative risk for gallbladder disease, controlling for the effect of hypertension.
The Outcome of 1 indicates the presence of endometrial cancer, and 0 indicates the control.
Gallbladder and Hypertension data indicators are also 0 or 1.
1. Select Help > Sample Data Library and open Endometrial Cancer.jmp.
2. Select Analyze > Consumer Research > Choice.
Chapter 5
Consumer Research
Choice Models
Additional Examples of Logistic Regression Using the Choice Platform
133
3. Click the Select Data Table button.
4. Select Endometrial Cancer as the profile data table. Click OK.
5. Assign Outcome to the Profile ID role.
6. Assign Pair to the Grouping role.
7. Add the following columns as model effects: Gallbladder, Hypertension.
8. Deselect the Firth Bias-Adjusted Estimates check box.
9. Select Run Model.
A pop-up dialog window asks whether this is a one-table analysis with all the data in the
Profile Table.
10. Click Yes.
11. On the Choice Model red triangle menu, select Utility Profiler.
The report is shown in Figure 5.59.
Figure 5.59 Logistic Regression on Endometrial Cancer Data
Likelihood Ratio tests are given for each factor. Note that Gallbladder is nearly significant at
the 0.05 level (p-value = 0.0532). Use the Utility Profiler to visualize the impact of the factors on
the response.
134
Choice Models
Statistical Details
Chapter 5
Consumer Research
Statistical Details
Parameter estimates from the choice model identify consumer utility, or marginal utilities in
the case of a linear utility function. Utility is the level of satisfaction consumers receive from
products with specific attributes and is determined from the parameter estimates in the
model.
The choice statistical model is expressed as follows:
Let X[k] represent a subject attribute design row, with intercept
Let Z[j] represent a choice attribute design row, without intercept
Then, the probability of a given choice for the k'th subject to the j'th choice of m choices is:
exp ( β' ( X [ k ] ⊗ Z [ j ] ) )
P i [ jk ] = ------------------------------------------------------------------m

exp ( β' ( X [ k ] ⊗ Z [ l ] ) )
l=1
where:
‒ ⊗ is the Kronecker rowwise product
‒ the numerator calculates for the j'th alternative actually chosen
‒ the denominator sums over the m choices presented to the subject for that trial
Chapter 6
Uplift Models
Model the Incremental Impact of Actions on Consumer Behavior
Use uplift modeling to optimize marketing decisions, to define personalized medicine
protocols, or, more generally, to identify characteristics of individuals who are likely to
respond to an intervention. Also known as incremental modeling, true lift modeling, or net
modeling, uplift modeling differs from traditional modeling techniques in that it finds the
interactions between a treatment and other variables. It directs focus to individuals who are
likely to react positively to an action or treatment.
Figure 6.1 Example of Uplift for a Hair Product Marketing Campaign
Contents
Uplift Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Example of the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Launch the Uplift Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
The Uplift Model Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Uplift Model Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
Uplift Report Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Chapter 6
Consumer Research
Uplift Models
Uplift Platform Overview
137
Uplift Platform Overview
Use the Uplift platform to model the incremental impact of an action, or treatment, on
individuals. An uplift model helps identify groups of individuals who are most likely to
respond to the action. Identification of these groups leads to efficient and targeted decisions
that optimize resource allocation and impact on the individual. (See Radcliffe and Surry,
2011.)
The Uplift platform fits partition models. While traditional partition models find splits to
optimize a prediction, uplift models find splits to maximize a treatment difference.
The uplift partition model accounts for the fact that some individuals receive the treatment,
while others do not. It does this by fitting a linear model to each possible (binary) split. A
continuous response is modeled as a linear function of the split, the treatment, and the
interaction of the split and treatment. A categorical response is expressed as a logistic function
of the split, the treatment, and the interaction of the split and treatment. In both cases, the
interaction term measures the difference in uplift between the groups of individuals in the
two splits.
The criterion used by the Uplift platform in defining splits is the significance of the test for
interaction over all possible splits. However, predictor selection based solely on p-values
introduces bias favoring predictors with many levels. For this reason, JMP adjusts p-values to
account for the number of levels. (See the paper “Monte Carlo Calibration of Distributions of
Partition Statistics” on the JMP website.) The splits in the Uplift platform are determined by
maximizing the adjusted p-values for t tests of the interaction effects. The logworth for each
adjusted p-value, namely -log10(adj p-value), is reported.
138
Uplift Models
Example of the Uplift Platform
Chapter 6
Consumer Research
Example of the Uplift Platform
The Hair Care Product.jmp sample data table results from a marketing campaign designed to
increase purchases of a hair coloring product targeting both genders. For purposes of
designing the study and tracking purchases, 126,184 “club card” members of a major beauty
supply chain were identified. Approximately half of these members were randomly selected
and sent a promotional offer for the product. Purchases of the product over a subsequent
three-month period by all club card members were tracked.
The data table shows a Promotion column, indicating whether the member received
promotional material. The column Purchase indicates whether the member purchased the
product over the test period. For each member, the following information was assembled:
Gender, Age, Hair Color (natural), U.S. Region, and Residence (whether the member is located in
an urban area). Also shown is a Validation column consisting of about 33% of the subjects.
For a categorical response, the Uplift platform interprets the first level in its value ordering as
the response of interest. This is why the column Purchase has the Value Ordering column
property. This property ensures that “Yes” responses are first in the ordering.
1. Select Help > Sample Data Library and open Hair Care Product.jmp.
2. Select Analyze > Consumer Research > Uplift.
3. From the Select Columns list:
‒ Select Promotion and click Treatment.
‒ Select Purchase and click Y, Response.
‒ Select Gender, Age, Hair Color, U.S. Region, and Residence, and click X, Factor.
‒ Select Validation and click Validation.
4. Click OK.
5. Below the Graph in the report that appears, click Go.
Based on the validation set, the optimal Number of Splits is determined to be three. The
Graph is shown in Figure 6.2. Note that the vertical scale has been modified in order to
show the detail.
Chapter 6
Consumer Research
Uplift Models
Example of the Uplift Platform
139
Figure 6.2 Graph after Three Splits
The graph indicates that uplift in purchases occurs for females with black, red, or brown
hair and for younger females (Age < 42) with blond hair. For older blond-haired women
(Age ≥ 42) and males, the promotion has a negative effect.
Launch the Uplift Platform
To launch the Uplift platform, select Analyze > Consumer Research > Uplift. Figure 6.3 shows a
launch window for the Hair Care Product.jmp sample data table. The columns that you enter
for Y, Response, and X, Factor can be continuous or categorical. In typical usage, the
Treatment column is categorical, and often has only two levels. If your Treatment column
contains more than two levels, the first level is treated as Treatment1 and the remaining levels
are combined in Treatment2.
140
Uplift Models
The Uplift Model Report
Chapter 6
Consumer Research
Figure 6.3 Launch Window for Uplift
You can specify your own Validation column, or designate a random portion of your data to
be selected as a Validation Portion. If you click the Validation button with no columns selected
in the Select Columns list, you can add a validation column to your data table. For more
information about the Make Validation Column utility, see Basic Analysis.
Note that the only Method currently supported by Uplift is Decision Tree.
The Uplift Model Report
The report opens by showing the Graph and the initial node of the Tree, as well as controls for
splitting.
Uplift Model Graph
The graph represents the response on the vertical axis. The horizontal axis corresponds to
observations, arranged by nodes. For each node, a black horizontal line shows the mean
response. Within each split, there is a subsplit for treatment shown by a red or blue line. These
lines indicate the mean responses for each of the two treatment groups within the split. The
value ordering of the treatment column determines the placement order of these lines. As
nodes are split, the graph updates to show the splits beneath the horizontal axis. Vertical lines
divide the splits.
Beneath the graph are the control buttons: Split, Prune, and Go. The Go button only appears if
there is a validation set. Also shown is the name of the Treatment column and its two levels,
called Treatment1 and Treatment2. If more than two levels are specified for the Treatment
column, all but the first level are treated as a single level and combined into Treatment2.
Chapter 6
Consumer Research
Uplift Models
The Uplift Model Report
141
To the right of the Treatment column information is a report showing summary values
relating to prediction. (Keep in mind that prediction is not the objective in uplift modeling.)
The report updates as splitting occurs. If a validation set is used, values are shown for both the
training and the validation sets.
RSquare The RSquare for the regression model associated with the tree. Note that the
regression model includes interactions with the treatment column.
N The number of observations.
Number of Splits The number of times splitting has occurred.
AICc The Corrected Akaike Information Criterion (AICc), computed using the associated
regression model. AICc is only given for continuous responses. For more details, see
Fitting Linear Models.
Uplift Decision Tree
The decision tree shows the splits used to model uplift. See Figure 6.4 for an example using
the Hair Care Product.jmp sample data table. Each node contains the following information:
Treatment
The name of the treatment column is shown, with its two levels.
Rate Only appears for two-level categorical responses. For each treatment level, the
proportion of subjects in this node who responded.
Mean Only appears for continuous responses. For each treatment level, the mean response
for subjects in this node.
Count
The number of subjects in this node in the specified treatment level.
t Ratio The t ratio for the test for a difference in response across the levels of Treatment for
subjects in this node. If the response is categorical, it is treated as continuous (values 0 and
1) for this test.
The difference in response means across the levels of Treatment. This is the uplift,
assuming that:
Trt Diff
‒ The first level in the treatment column’s value ordering represents the treatment.
‒ The response is defined so that larger values reflect greater impact.
LogWorth The value of the logworth for the subsequent split based on the given node.
142
Uplift Models
The Uplift Model Report
Chapter 6
Consumer Research
Figure 6.4 Nodes for First Split
Candidates Report
Each node also contains a Candidates report. This report gives:
Term
The model term.
LogWorth The maximum logworth over all possible splits for the given term. The logworth
corresponding to a split is -log10 of the adjusted p-value.
F Ratio When the response is continuous, this is the F Ratio associated with the interaction
term in a linear regression model. The regression model specifies the response as a linear
function of the treatment, the binary split, and their interaction. When the response is
categorical, this is the ChiSquare value for the interaction term in a nominal logistic model.
Gamma When the response is continuous, this is the coefficient of the interaction term in the
linear regression model used in computing the F ratio. When the response is categorical,
this is an estimate of the interaction constructed from Firth-adjusted log-odds ratios.
If the term is continuous, this is the point that defines the split. If the term is
categorical, this describes the first (left) node.
Cut Point
Uplift Report Options
With the exception of the options described below, all of the red triangle options for the Uplift
report are described in the documentation for the Partition platform. For details about these
options, see the Partition Models chapter in the Specialized Models book.
Chapter 6
Consumer Research
Uplift Models
The Uplift Model Report
143
Minimum Size Split
This option presents a window where you enter a number or a fractional portion of the total
sample size to define the minimum size split allowed. To specify a number, enter a value
greater than or equal to 1. To specify a fraction of the sample size, enter a value less than 1. The
default value for the Uplift platform is set to 25 or the floor of the number of rows divided by
2,000, whichever value is greater.
Column Uplift Contributions
This table and plot address a column’s contribution to the uplift tree structure. A column’s
contribution is computed as the sum of the F Ratio values associated with its splits. Recall that
these values measure the significance of the treatment-by-split interaction term in the linear
regression model.
Uplift Graph
Consider the observations in the training set. Define uplift for an observation as the difference
between the predicted probabilities or means across the levels of Treatment for the
observation’s terminal node. These uplift values are sorted in descending order. On its vertical
axis, the Uplift Graph shows the uplift values. On its horizontal axis, the graph shows the
proportion of observations with each uplift value.
See Figure 6.5 for an example of an Uplift Graph for the Hair Care Product.jmp sample data
table after three splits. Note that, for two groups of subjects (males and non-blond women in
the Age ≥ 42 group), the promotion has a negative effect.
The horizontal lines shown on the Uplift Graph delineate the graph for the validation set.
Specifically, the decision tree is evaluated for the validation set and the Uplift Graph is
constructed from the estimated uplifts.
Figure 6.5 Uplift Graph
144
Uplift Models
The Uplift Model Report
Chapter 6
Consumer Research
Save Columns
Save Difference Saves the estimated difference in mean responses across levels of Treatment
for the observation’s node. This is the estimated uplift.
Save Difference Formula Saves the formula for the Difference, or uplift.
Chapter 7
Item Analysis
Analyze Test Results by Item and Subject
Item Response Theory (IRT) is a method of scoring tests. Although classical test theory
methods have been widely used for a century, IRT provides a better and more scientifically
based scoring procedure.
Its advantages include:
•
Scoring tests at the item level, giving insight into the contributions of each item on the total
test score.
•
Producing scores of both the test takers and the test items on the same scale.
•
Fitting nonlinear logistic curves, more representative of actual test performance than
classical linear statistics.
Figure 7.1 Item Analysis Example
Contents
Item Analysis Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Launch the Item Analysis Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
The Item Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Characteristic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
Information Curves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Dual Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Item Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Technical Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Chapter 7
Consumer Research
Item Analysis
Item Analysis Platform Overview
147
Item Analysis Platform Overview
Psychological measurement is the process of assigning quantitative values as representations of
characteristics of individuals or objects, so-called psychological constructs. Measurement theories
consist of the rules by which those quantitative values are assigned. Item Response Theory
(IRT) is a measurement theory.
IRT uses a mathematical function to relate an individual’s probability of correctly responding
to an item to a trait of that individual. Frequently, this trait is not directly measurable and is
therefore called a latent trait.
To see how IRT relates traits to probabilities, first examine a test question that follows the
Guttman “perfect scale” as shown in Figure 7.2. The horizontal axis represents the amount of
the theoretical trait that the examinee has. The vertical axis represents the probability that the
examinee will get the item correct. (A missing value for a test question is treated as an
incorrect response.) The curve in Figure 7.2 is called an item characteristic curve (ICC).
P(Correct Response)
Figure 7.2 Item Characteristic Curve of a Perfect Scale Item
1.0
0.5
b
Trait Level
This figure shows that a person who has ability less than the value b has a 0% chance of
getting the item correct. A person with trait level higher than b has a 100% chance of getting
the item correct.
Of course, this is an unrealistic item, but it is illustrative in showing how a trait and a question
probability relate to each other. More typical is a curve that allows probabilities that vary from
zero to one. A typical curve found empirically is the S-shaped logistic function with a lower
asymptote at zero and upper asymptote at one. It is markedly nonlinear. An example curve is
shown in Figure 7.3.
148
Item Analysis
Item Analysis Platform Overview
Chapter 7
Consumer Research
Figure 7.3 Example Item Response Curve
The logistic model is the best choice to model this curve, because it has desirable asymptotic
properties, yet is easier to deal with computationally than other proposed models (such as the
cumulative normal density function). The model itself is
1–c
P ( θ ) = c + ------------------------------------–( a ) ( θ – b )
1+e
In this model, referred to as a Three-Parameter Logistic (3PL) model, the variable a represents
the steepness of the curve at its inflection point. Curves with varying values of a are shown in
Figure 7.4. This parameter can be interpreted as a measure of the discrimination of an item—
that is, how much more difficult the item is for people with high levels of the trait than for
those with low levels of the trait. Very large values of a make the model practically the step
function shown in Figure 7.2. It is generally assumed that an examinee will have a higher
probability of getting an item correct as their level of the trait increases. Therefore, a is
assumed to be positive and the ICC is monotonically increasing. Some use this
positive-increasing property of the curve as a test of the appropriateness of the item. Items
whose curves do not have this shape should be considered as candidates to be dropped from
the test.
Figure 7.4 Logistic Model for Several Values of a
Chapter 7
Consumer Research
Item Analysis
Item Analysis Platform Overview
149
Changing the value of b merely shifts the curve from left to right, as shown in Figure 7.5. It
corresponds to the value of θ at the point where P(θ)=0.5. The parameter b can therefore be
interpreted as item difficulty where (graphically), the more difficult items have their inflection
points farther to the right along their x-coordinate.
Figure 7.5 Logistic Curve for Several Values of b
Notice that
limP ( θ ) = c
θ → –∞
and therefore c represents the lower asymptote, which can be nonzero. ICCs for several values
of c are shown graphically in Figure 7.6. The c parameter is theoretically pleasing, because a
person with no ability of the trait might have a nonzero chance of getting an item right.
Therefore, c is sometimes called the pseudo-guessing parameter.
Figure 7.6 Logistic Model for Several Values of c
By varying these three parameters, a wide variety of probability curves are available for
modeling. A sample of three different ICCs is shown in Figure 7.7. Note that the lower
asymptote varies, but the upper asymptote does not. This is because of the assumption that
there might be a lower guessing parameter, but as the trait level increases, there is always a
theoretical chance of 100% probability of correctly answering the item.
150
Item Analysis
Item Analysis Platform Overview
Chapter 7
Consumer Research
Figure 7.7 Three Item Characteristic Curves
Note, however, that the 3PL model might by unnecessarily complex for many situations. If, for
example, the c parameter is restricted to be zero (in practice, a reasonable restriction), there are
fewer parameters to predict. This model, where only a and b parameters are estimated, is
called the 2PL model.
Another advantage of the 2PL model (aside from its greater stability than the 3PL) is that b can
be interpreted as the point where an examinee has a 50% chance of getting an item correct.
This interpretation is not true for 3PL models.
A further restriction can be imposed on the general model when a researcher can assume that
test items have equal discriminating power. In these cases, the parameter a is set equal to 1,
leaving a single parameter to be estimated, the b parameter. This 1PL model is frequently
called the Rasch model, named after Danish mathematician Georg Rasch, the developer of the
model. The Rasch model is quite elegant, and is the least expensive to use computationally.
Caution: You must have a lot of data to produce stable parameter estimates using a 3PL
model. 2PL models are frequently sufficient for tests that intuitively deserve a guessing
parameter. Therefore, the 2PL model is the default and recommended model.
Launch the Item Analysis Platform
For example, open the sample data file MathScienceTest.jmp. These data are a subset of the
data from the Third International Mathematics and Science Study (TIMMS) conducted in
1996.
To launch the Item Analysis platform, select Analyze > Consumer Research > Item Analysis.
This shows the dialog in Figure 7.8.
Chapter 7
Consumer Research
Item Analysis
Item Analysis Platform Overview
151
Figure 7.8 Item Analysis Launch Window
Y, Test Items Are the questions from the test instrument.
Freq
By
Specifies a variable used to specify the number of times each response pattern appears.
Performs a separate analysis for each level of the specified variable.
Specify the desired model (1PL, 2PL, or 3PL) by selecting it from the Model drop-down menu.
For this example, specify all fourteen continuous questions (Q1, Q2,..., Q14) as Y, Test Items
and click OK. This accepts the default 2PL model.
Special Note on 3PL Models
If you select the 3PL model, a dialog pops up asking for a penalty for the c parameters
(thresholds). This is not asking for the threshold itself. The penalty that it requests is similar to
the type of penalty parameter that you would see in ridge regression, or in neural networks.
The penalty is on the sample variance of the estimated thresholds, so that large values of the
penalty force the estimated thresholds’ values to be closer together. This has the effect of
speeding up the computations, and reducing the variability of the threshold (at the expense of
some bias).
In cases where the items are questions on a multiple choice test where there are the same
number of possible responses for each question, there is often reason to believe (a priori) that
the threshold parameters would be similar across items. For example, if you are analyzing the
results of a 20-question multiple choice test where each question had four possible responses,
it is reasonable to believe that the guessing, or threshold, parameters would all be near 0.25.
So, in some cases, applying a penalty like this has some “physical intuition” to support it, in
addition to its computational advantages.
152
Item Analysis
The Item Analysis Report
Chapter 7
Consumer Research
The Item Analysis Report
The following plots appear in Item Analysis reports.
Characteristic Curves
Item characteristic curves for each question appear in the top section of the output. Initially, all
curves are shown stacked in a single column. They can be rearranged using the Number of
Plots Across command, found in the drop down menu of the report title bar. For Figure 7.9,
four plots across are displayed.
Figure 7.9 Component Curves
A vertical red line is drawn at the inflection point of each curve. In addition, dots are drawn at
the actual proportion correct for each ability level, providing a graphical method of judging
goodness-of-fit.
Gray information curves show the amount of information each question contributes to the
overall information of the test. The information curve is the slope of the ICC curve, which is
maximized at the inflection point.
Chapter 7
Consumer Research
Item Analysis
The Item Analysis Report
153
Figure 7.10 Elements of the ICC Display
ICC
information
curve
ICC Inflection Point
Information Curves
Questions provide varying levels of information for different ability levels. The gray
information curves for each item show the amount of information that each question
contributes to the total information of the test. The total information of the test for the entire
range of abilities is shown in the Information Plot section of the report (Figure 7.11).
Figure 7.11 Information Plot
Dual Plots
The information gained from item difficulty parameters in IRT models can be used to
construct an increasing scale of questions, from easiest to hardest, on the same scale as the
examinees. This structure gives information about which items are associated with low levels
of the trait, and which are associated with high levels of the trait.
JMP shows this correspondence with a dual plot. The dual plot for this example is shown in
Figure 7.12.
154
Item Analysis
The Item Analysis Report
Chapter 7
Consumer Research
Figure 7.12 Dual Plot
Questions are plotted to the left of the vertical dotted line, examinees on the right. In addition,
a histogram of ability levels is appended to the right side of the plot.
This example shows a wide range of abilities. Q10 is rated as difficult, with an examinee
needing to be around half a standard deviation above the mean in order to have a 50% chance
of correctly answering the question. Other questions are distributed at lower ability levels,
with Q11 and Q4 appearing as easier. There are some questions that are off the displayed scale
(Q7 and Q14).
The estimated parameter estimates appear below the Dual Plot, as shown in Figure 7.13.
Chapter 7
Consumer Research
Item Analysis
Item Analysis Platform Options
155
Figure 7.13 Parameter Estimates
Item Identifies the test item.
Is the b parameter from the model. A histogram of the difficulty parameters is
shown beside the difficulty estimates.
Difficulty
Discrimination Is the a parameter from the model, shown only for 2PL and 3PL models. A
histogram of the discrimination parameters is shown beside the discrimination estimates.
Threshold
Is the c parameter from the model, shown only for 3PL models.
Item Analysis Platform Options
The following three commands are available from the drop-down menu on the title bar of the
report.
Number of Plots Across Brings up a dialog to specify how many plots should be grouped
together on a single line. Initially, plots are stacked one-across. Figure 7.9 on page 152
shows four plots across.
Save Ability Formula Creates a new column in the data table containing a formula for
calculating ability levels. Because the ability levels are stored as a formula, you can add
rows to the data table and have them scored using the stored ability estimates. In addition,
you can run several models and store several estimates of ability in the same data table.
The ability is computed using the IRT Ability function. The function has the following
form
IRT Ability (Q1, Q2,...,Qn, [a1, a2,..., an, b1, b2,..., bn, c1, c2, ...,
cn]);
where Q1, Q2,...,Qn are columns from the data table containing items, a1, a2,..., an are the
corresponding discrimination parameters, b1, b2,..., bn are the corresponding difficulty
parameters for the items, and c1, c2, ..., cn are the corresponding threshold parameters.
Note that the parameters are entered as a matrix, enclosed in square brackets.
156
Item Analysis
Technical Details
Script
Chapter 7
Consumer Research
Contains options that are available to all platforms. See Using JMP.
Technical Details
Note that P(θ) does not necessarily represent the probability of a positive response from a
particular individual. It is certainly feasible that an examinee might definitely select an
incorrect answer, or that an examinee might know an answer for sure, based on the prior
experiences and knowledge of the examinee, apart from the trait level. It is more correct to
think of P(θ) as the probability of response for a set of individuals with ability level θ. Said
another way, if a large group of individuals with equal trait levels answered the item, P(θ)
predicts the proportion that would answer the item correctly. This implies that IRT models are
item-invariant; theoretically, they would have the same parameters regardless of the group
tested.
An assumption of these IRT models is that the underlying trait is unidimensional. That is to
say, there is a single underlying trait that the questions measure that can be theoretically
measured on a continuum. This continuum is the horizontal axis in the plots of the curves. If
there are several traits being measured, each of which have complex interactions with each
other, then these unidimensional models are not appropriate.
Chapter 8
Multiple Correspondence Analysis
Multiple Correspondence Analysis (MCA) takes multiple categorical variables and seeks to
identify associations between levels of those variables. MCA extends correspondence analysis
from two variables to many. It can be thought of as analogous to principal component analysis
for quantitative variables. Similar to other multivariate methods, it is a dimension reducing
method; it represents the data as points in 2- or 3-dimensional space.
Multiple correspondence analysis is frequently used in the social sciences particularly in
France and Japan. It can be used in survey analysis to identify question agreement. It is also
used in consumer research to identify potential markets for products. Microarray studies in
genetics also use MCA to identify potential relationships between genes.
Figure 8.1 Multiple Correspondence Analysis
Contents
Example of Multiple Correspondence Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Launch the Multiple Correspondence Analysis Platform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
The Multiple Correspondence Analysis Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Multiple Correspondence Analysis Platform Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Correspondence Analysis Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Cross Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Cross Table of Supplementary Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Cross Table of Supplementary Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Additional Examples of the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . 167
Example Using a Supplementary Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Example Using a Supplementary ID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Statistical Details for the Multiple Correspondence Analysis Platform . . . . . . . . . . . . . . . . . . 171
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Example of Multiple Correspondence Analysis
159
Example of Multiple Correspondence Analysis
This example uses the Car Poll.jmp sample data table, which contains data collected from car
polls. The data include aspects about the individuals polled, such as sex, marital status, and
age. The data also include aspects about the car that they own, such as the country of origin,
the size, and the type. You want to explore relationships between sex, marital status, country
and size of car to identify consumer preferences.
1. Select Help > Sample Data Library and open Car Poll.jmp.
2. Select Analyze > Consumer Research > Multiple Correspondence Analysis.
3. Select sex, marital status, country, and size and click Y, Response.
In MCA, usually all columns are considered responses rather than some being responses
and others explanatory.
4. Click OK.
Figure 8.2 Completed Multiple Correspondence Analysis Launch Window
The Multiple Correspondence Analysis report is shown in Figure 8.3. Note that some of the
outlines are closed because of space considerations.
The Variable Summary report provides a concise view of the analysis completed.
The Correspondence Analysis report shows the cloud of categories of the four variables as
projected onto the two principal axes. From this cloud, you can see that Americans have a
strong association with the large car size while Japanese are highly associated with the small
car size. Also, males are strongly associated with the small car type and females are associated
with the medium car size. This information could be used in market research to identify target
audiences for advertisements.
160
Multiple Correspondence Analysis
Example of Multiple Correspondence Analysis
Figure 8.3 Multiple Correspondence Analysis Report
Chapter 8
Consumer Research
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Example of Multiple Correspondence Analysis
161
Launch the Multiple Correspondence Analysis Platform
Launch the Multiple Correspondence Analysis platform by selecting Analyze > Consumer
Research > Multiple Correspondence Analysis. This example uses the Car Poll.jmp sample data
table.
Figure 8.4 Multiple Correspondence Analysis Launch Window
Assigns the categorical columns to be analyzed. Usually, in MCA, you are
interested in the associations between variables, but there are not explicit “explanatory”
and “response” variables.
Y, Response
X, Factor Assigns the categorical columns to be used as factor, or explanatory, variables.
Z, Supplementary Variable Assigns the columns to be used as supplementary variables. These
variables are those you are interested in identifying associations with but not include in the
calculations.
Supplementary ID Assigns the column that identifies rows to be used as supplementary. A
supplementary ID column usually has 1s and 0s. The rows associated with ID 0 are treated
as supplementary rows. The Supplementary ID option functions if only one Y and only
one X column have been specified.
Assigns a frequency variable to this role. This is useful if your data are summarized. In
this instance, you have one column for the Y values and another column for the frequency
of occurrence of the Y values.
Freq
By
Produces a separate report for each level of the By variable. If more than one By variable
is assigned, a separate report is produced for each possible combination of the levels of the
By variables.
162
Multiple Correspondence Analysis
The Multiple Correspondence Analysis Report
Chapter 8
Consumer Research
The Multiple Correspondence Analysis Report
The initial Multiple Correspondence Analysis report shows the variable summary,
correspondence analysis plot, and details of the dimensions of the data in order of importance.
From the plot of the cloud of categories or individuals, you can identify associations that exist
within the data. The details provide information about whether the two dimensions shown in
the plot are sufficient to understand the relationships within the table.
The Variable Summary shows the columns used in the analysis and the roles that you selected
in the launch window. If you select the Show Controls check box, a list of the columns in the
data table appears to the left. You can change the columns in the analysis either by selecting a
column and clicking Add Y, Add X, or Add Z. Or you can drag the column to the header in the
variable summary table. This enables you to modify the analysis without returning to the
launch window.
Figure 8.5 Multiple Correspondence Analysis Report with Show Controls Selected
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Multiple Correspondence Analysis Platform Options
163
Multiple Correspondence Analysis Platform Options
The Multiple Correspondence Analysis red triangle menu options give you the ability to
customize reports according to your needs. The reports available are determined by the type
of analysis that you conduct.
Provides correspondence analysis reports. These reports give the
plots, details, estimates of coordinates, and summary statistics. See “Additional Examples
of the Multiple Correspondence Analysis Platform” on page 167.
Correspondence Analysis
Cross Table Provides the Burt table or contingency table as appropriate for variable roles
selected. See “Cross Table” on page 166.
Provides a contingency table of the supplementary
variable(s) versus the response variable(s). This table appears by default only if a
supplementary variable has been specified in the launch window.
Cross Table of Supplementary Rows
Cross Table of Supplementary Columns Provides a contingency table of the X, Factor
variable(s) versus the supplementary variable(s). This table appears by default only if a
factor variable and a supplementary variable have been specified in the launch window.
Mosaic Plot Displays a mosaic bar chart for each nominal or ordinal response variable. A
mosaic plot is a stacked bar chart where each segment is proportional to its group’s
frequency count. This option is available if only one Y and only one X variable are selected.
Tests for Independence Provides the tests for independence whether there is association
between the row and column variables. There are two versions of this test, the Pearson
form and the Likelihood Ratio form, both with chi-square statistics. This option is available
only when there is one Y variable and one X variable.
Script
Contains options that are available to all platforms. See the Using JMP book.
Correspondence Analysis Options
The reports available under Correspondence Analysis are determined by the type of analysis
that you conduct. Several of these reports are shown by default.
Shows the two-dimensional cloud of categories in the plane described by the first
two principal axes. This plot appears by default.
Show Plot
Provides the details of the analysis including the singular values, inertias,
ChiSquare statistics, percent, and cumulative percent. This report appears by default. See
“Show Detail” on page 164.
Show Detail
Show Adjusted Inertia Provides reports of the Benzecri and Greenacre adjusted inertia. See
Benzecri (1979) and Greenacre (1984). This option is available only for MCA, that is all
columns are Ys. See “Show Adjusted Inertia” on page 164.
164
Multiple Correspondence Analysis
Multiple Correspondence Analysis Platform Options
Chapter 8
Consumer Research
Provides a report of up to the first three principal coordinates for the
categories in the analysis, as appropriate. See “Show Coordinates” on page 165.
Show Coordinates
Show Summary Statistics Provides a report of the summary statistics, Quality, Mass, and
Inertia, for each category in the analysis. See “Show Summary Statistics” on page 165.
Show Partial Contributions to Inertia Provides a report of the contribution of each category to
the inertia for each of up to the first three dimensions. See “Show Partial Contributions to
Inertia” on page 166.
Show Squared Cosines Provides a report of the squared cosines of each category for each of
up to the first three dimensions. See “Show Squared Cosines” on page 166.
Shows the three-dimensional cloud of categories of the Y, X,
and Z variables in the space described by the first three principal axes. This option is not
available if there are less than three dimensions.
3D Correspondence Analysis
Show Plot
The plot displays a projection of the cloud of categories or individuals onto the plane
described by the first two principal axes. The distance scale is the same in all directions. You
can toggle the dimensions shown in the plot using the Select Dimension controls below the
plot. The first control defines the horizontal axis of the plot, and the second control defines the
vertical axis of the plot. Click the arrow button to cycle through the dimensions shown in the
plot.
Show Detail
Shows the singular value decomposition of the contingency table or Burt
table. For the formula, see “Statistical Details for the Details Report” on page 171.
Singular Value
Inertia Lists the square of the singular values, reflecting the relative variation accounted for
in the canonical dimensions.
ChiSquare Lists the portion of the overall Chi-square for the Burt or contingency table
represented by the dimension.
Percent
Portion of inertia with respect to the total inertia.
Cumulative Percent Shows the cumulative portion of inertia. If the first two singular values
capture the bulk of the inertia, then the 2-D correspondence analysis plot is sufficient to
show the relationships in the table.
Show Adjusted Inertia
The principal inertias of a Burt table in MCA are the eigenvalues. The problem with these
inertias is that they provide a pessimistic indication of fit. Benzécri proposed an inertia
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Multiple Correspondence Analysis Platform Options
165
adjustment. Greenacre argued that the Benzécri adjustment overestimates the quality of fit
and proposed an alternate adjustment. Both adjustments are calculated for your reference. See
“Statistical Details for Adjusted Inertia” on page 171.
Inertia Lists the square of the singular values, reflecting the relative variation accounted for
in the canonical dimensions.
Adjusted Inertia Lists the adjusted inertia according to either the Benzécri or Greenacre
adjustment.
Portion of adjusted inertia with respect to the total inertia.
Percent
Cumulative Percent Shows the cumulative portion of adjusted inertia. If the first two singular
values capture the bulk of the inertia, then the 2-D correspondence analysis plot is
sufficient to show the relationships in the table.
Show Coordinates
Y Lists the columns specified as Y, Response variables.
Category Lists the levels of the Y variables.
Lists the coordinate for the category along the
respective principal axis. By default, the table shows the first three dimension columns.
Additional dimension columns are hidden. To reveal these optional columns, right-click on
the table and select the dimension columns from the Columns submenu.
Dimension 1, Dimension 2, Dimension 3
If there are columns specified as X, Factor variables, the Coordinates report displays tables of
both X and Y with the same report headings. If a Z, Supplementary Variable is specified, the
coordinates are listed below the X and Y coordinates as applicable.
Show Summary Statistics
Y Lists the columns specified as Y, Response variables.
Category Lists the levels of the Y variables.
Quality
Lists the quality of the representation of the level by the solution.
Mass Lists the row percentage from the Burt or contingency table.
Inertia Lists the row marginal percentage of the total inertia accounted for by the respective
point.
If there are columns specified as X, Factor variables, the Summary Statistics report displays
tables of both X and Y with the same report headings. See “Statistical Details for Summary
Statistics” on page 172.
166
Multiple Correspondence Analysis
Multiple Correspondence Analysis Platform Options
Chapter 8
Consumer Research
Show Partial Contributions to Inertia
Y Lists the columns specified as Y, Response variables.
Category Lists the levels of the Y variables.
Lists the contribution of the level to the inertia of the
respective dimension. By default, the table shows the first three dimension columns.
Additional dimension columns are hidden. To reveal these optional columns, right-click on
the table and select the dimension columns from the Columns submenu.
Dimension 1, Dimension 2, Dimension 3
Each category contributes to the inertia of each dimension. The partial contributions within
each dimension sum to 1. If there are columns specified as X, Factor variables, the Partial
Contributions to Inertia report displays tables of both X and Y with the same report headings.
See “Statistical Details for Partial Contributions to Inertia” on page 172.
Show Squared Cosines
Y Lists the columns specified as Y, Response variables.
Category Lists the levels of the Y variables.
Lists the quality of the representation of the level by
the respective dimension. By default, the table shows the first three dimension columns.
Additional dimension columns are hidden. To reveal these optional columns, right-click on
the table and select the dimension columns from the Columns submenu.
Dimension 1, Dimension 2, Dimension 3
The values indicate the quality of each point for the dimension listed. The squared cosine can
be interpreted as the correlation of the point with the dimension. The sum of the squared
cosines of the first two dimensions is equal to the quality indicated in the summary statistics
report. The term refers to the fact that the value is also the squared cosine value of the angle
the point makes with the dimension.
If there are columns specified as X, Factor variables, the Squared Cosines report displays
tables of both X and Y with the same report headings.
Cross Table
The Burt table is the basis of the multiple correspondence analysis. It is a partitioned
symmetric table of all pairs of categorical variables. The diagonal partitions are diagonal
matrices (a cross-table of a variable with itself). The off-diagonal partitions are ordinary
contingency tables. When you select multiple Y, Response columns with no X, Factor columns,
the Burt table is created. If you select any X, Factor columns, a traditional contingency table is
created instead of a Burt table.
The red triangle menu for the Burt or contingency table contains options of statistics to display
in the table.
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Additional Examples of the Multiple Correspondence Analysis Platform
167
Cell frequency, margin total frequencies, and grand total (total sample size). This
appears by default.
Count
Total %
Percent of cell counts and margin totals to the grand total. This appears by default.
Cell Chi Square
Chi-square values computed for each cell as (O - E)2 / E.
Col % Percent of each cell count to its column total.
Row % Percent of each cell count to its row total.
Expected Expected frequency (E) of each cell under the assumption of independence.
Computed as the product of the corresponding row total and column total divided by the
grand total.
Deviation
Observed cell frequency (O) minus the expected cell frequency (E).
Col Cum Cumulative column total.
Col Cum % Cumulative column percentage.
Row Cum Cumulative row total.
Row Cum % Cumulative row percentage.
Make Into Data Table Creates one data table for each statistic shown in the table.
Cross Table of Supplementary Rows
When a Z, Supplementary column is selected, a contingency table with the supplementary
column levels as the rows and the response column levels as the columns is created. The red
triangle menu contains the same options as the Burt Table.
Cross Table of Supplementary Columns
When an X, Factor column and a Z, Supplementary column are selected, a contingency table
with the X, Factor levels as rows and the Supplementary levels as columns is created. The red
triangle menu contains the same options as the Burt Table.
Additional Examples of the Multiple Correspondence Analysis
Platform
Example Using a Supplementary Variable
This example uses the Car Poll.jmp sample data table, which contains data collected from car
polls. The data include aspects about the individuals polled, such as sex, marital status, and
age. The data also include aspects about the car that they own, such as the country of origin,
168
Multiple Correspondence Analysis
Additional Examples of the Multiple Correspondence Analysis Platform
Chapter 8
Consumer Research
the size, and the type. You want to explore relationships between sex, country, and size of car
to identify consumer preferences.
1. Select Help > Sample Data Library and open Car Poll.jmp.
2. Select Analyze > Consumer Research > Multiple Correspondence Analysis.
3. Select country and size and click Y, Response.
4. Select marital status and click Z, Supplementary Variable.
5. Click OK.
Unlike in the first example, this analysis does not use marital status in the calculations. Marital
status is plotted after the calculations are complete.
You see from the plot strong relationships between Japanese and Small cars as well as
American and Large cars. The two marital statuses are plotted in a different color. Single
people seem to prefer smaller cars a bit more than married people.
Figure 8.6 MCA with Supplementary Variable Report
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Additional Examples of the Multiple Correspondence Analysis Platform
169
Example Using a Supplementary ID
The United States census allows for examining population growth over the last century. The
US Regional Population.jmp sample data table contains populations of the 50 US states
grouped into regions for each of the census years from 1920 to 2010. Alaska and Hawaii are
treated as supplementary regions because they were not states during the entire time, and
they are not part of the contiguous United States. You are interested in whether the population
growth in these two states differs from the rest of the US.
1. Select Help > Sample Data Library and open US Regional Population.jmp.
2. Select Analyze > Consumer Research > Multiple Correspondence Analysis.
3. Select Year and click Y, Response.
4. Select Region and click X, Factor.
5. Select ID and click Supplementary ID.
6. Select Population and click Freq.
7. Click OK.
The Details report shows that the association between years and regions is almost entirely
explained by the first dimension. The plot shows that years are in the correct order on the first
dimension. This ordering occurs naturally through the correspondence analysis; there is no
information about the order provided to the analysis.
Notice that the ordering of the regions reflects the population shift from the Midwest to the
Northeast to the South and finally to the Mountain and West.
Alaska and Hawaii were not used in the computation of the analysis but are plotted based on
the results. Their growth pattern is most similar to the Pacific states. Alaska’s growth is even
more extreme than the Pacific region.
170
Multiple Correspondence Analysis
Additional Examples of the Multiple Correspondence Analysis Platform
Figure 8.7 MCA with Supplementary ID Report
Chapter 8
Consumer Research
Chapter 8
Consumer Research
Multiple Correspondence Analysis
Statistical Details for the Multiple Correspondence Analysis Platform
171
Statistical Details for the Multiple Correspondence Analysis
Platform
Statistical Details for the Details Report
When a simple Correspondence Analysis is performed, the report lists the singular values of
the following equation:
– 0.5
Dr
– 0.5
( P – rc' ) D c
= UD u V ′
where:
•
P is the matrix of counts divided by the total frequency
•
r and c are row and column sums of P
•
the Ds are diagonal matrices of the values of r and c
When Multiple Correspondence Analysis is performed, the singular value decomposition
extends to:
D
– 0.5 
 – 0.5
C
= UD u V ′
 ----------- – D 11' D D
2
Q n

where:
1
D =  ---- diag ( D 1, D 2, …, D Q )
 m
•
C is the Burt table.
•
Q is the number of categorical variables
•
n is the number of observations
•
1 is a column vector of ones
Statistical Details for Adjusted Inertia
The usual principal inertias of a Burt table constructed from m categorical variables in MCA
2
are the eigenvalues uk from D u . These inertias provide a pessimistic indication of fit. Benzécri
(1979) proposed the following inertia adjustment; it is also described by Greenacre (1984, p.
145):
m 2 
1 2
1
 ------------ × u k – ---- for u k > --- m – 1

m
m
172
Multiple Correspondence Analysis
Statistical Details for the Multiple Correspondence Analysis Platform
Chapter 8
Consumer Research
This adjustment computes the percent of adjusted inertia relative to the sum of the adjusted
inertias for all inertias greater than 1 ⁄ m .
Greenacre (1994, p. 156) argues that the Benzécri adjustment overestimates the quality of fit.
Greenacre proposes instead to compute the percentage of adjusted inertia relative to:
n c – m
4
m 
-------------  trace ( D u ) – ---------------
2
m – 1
m 
4
for all inertias greater than 1 ⁄ m , where trace ( D u ) is the sum of squared inertias and nc is the
total number of categories across the m variables.
Statistical Details for Summary Statistics
Quality is the ratio of the sum of the squared cosines in the chosen number of dimensions and
the sum of the squared cosines in the maximum number of dimensions. Quality indicates how
well the point is represented in the reduced dimension space.
Inertia is the total Pearson Chi-square for a two-way frequency table divided by the sum of all
observations in the table. In the summary statistics table, the relative inertia is listed.
Relative inertia is the proportion of the contribution of the point to the overall inertia.
Statistical Details for Partial Contributions to Inertia
The contribution of a row or column to the inertia of a dimension is calculated as:
2
mass × coordinate
contribution = ------------------------------------------------------dimensioninertia
Appendix A
References
Akaike, H. (1974), “Factor Analysis and AIC,” Pschychometrika, 52, 317–332.
Akaike, H. (1987), “A new Look at the Statistical Identification Model,” IEEE Transactions on
Automatic Control, 19, 716–723.
Bartlett, M.S. (1954), “A Note on the Multiplying Factors for Various Chi Square
Approximations,” Journal of the Royal Statistical Society, 16 (Series B), 296-298.
Benzécri, J. P. (1979), “Sur le calcul des taux d’inertie dans l’analyse d’un questionnaire,
addendum et erratum à [BIN. MULT.],” Cahiers de l’Analyse des Données, 4, 377–378.
Dwass, M. (1955), “A Note on Simultaneous Confidence Intervals,” Annals of Mathematical
Statistics 26: 146–147.
Farebrother, R.W. (1981), “Mechanical Representations of the L1 and L2 Estimation
Problems,” Statistical Data Analysis, 2nd Edition, Amsterdam, North Holland: edited by Y.
Dodge.
Fieller, E.C. (1954), “Some Problems in Interval Estimation,” Journal of the Royal Statistical
Society, Series B, 16, 175-185.
Firth, D. (1993), “Bias Reduction of Maximum Likelihood Estimates,” Biometrika 80:1, 27–38.
Goodnight, J.H. (1978), “Tests of Hypotheses in Fixed Effects Linear Models,” SAS Technical
Report R–101, Cary: SAS Institute Inc, also in Communications in Statistics (1980), A9
167–180.
Goodnight, J.H. and W.R. Harvey (1978), “Least Square Means in the Fixed Effect General
Linear Model,” SAS Technical Report R–103, Cary NC: SAS Institute Inc.
Greenacre, M. J. (1984), Theory and Applications of Correspondence Analysis, London: Academic
Press.
Heinze, G. and Schemper, M. (2002), “A Solution to the Problem of Separation in Logistic
Regression,” Statistics in Medicine 21:16, 2409–2419.
Hocking, R.R. (1985), The Analysis of Linear Models, Monterey: Brooks–Cole.
Hosmer, D.W. and Lemeshow, S. (2000), Applied Logistic Regression, Second Edition, New York:
John Wiley and Sons.
Kaiser, H.F. (1958), “The varimax criterion for analytic rotation in factor analysis”
Psychometrika, 23, 187–200.
McFadden, D. (1974), “Conditional Logit Analysis of Qualitative Choice Behavior,” in P.
Zarembka, ed., Frontiers in Econometrics, pp. 105–142.
Radcliffe, N. J., and Surry, P. D. (2011), “Real-World Uplift Modelling with Significance-Based
Uplift Trees,” Stochastic Solutions White Paper, Portrait Technical Report TR-2011-1.
174
References
Appendix A
Consumer Research
Reichheld, F. F. (2003) “The One Number You Need to Grow,” Harvard Business Review, Vol. 81
No. 12, 46-54.
Wright, S.P. and R.G. O’Brien (1988), “Power Analysis in an Enhanced GLM Procedure: What
it Might Look Like,” SUGI 1988, Proceedings of the Thirteenth Annual Conference, 1097–1102,
Cary NC: SAS Institute Inc.
Index
Consumer Research
A
exploratory factor analysis 67
Agreement Statistic 42
Aligned Responses 34
F
B
By variable 72, 161
C
Categorical Platform 27
Free Text 32
Launch Window 30
Overview 29
Report 37
Structured 32
Supercategories 37
Choice Modeling 85–129
Choice Platform
Launch Platform 98
Segmentation 115
Single Table Analysis 112
Common Factor Analysis 74
common factor analysis 67
Compare Each Sample 52–53
Comparisons with Letters 53
Confidence Intervals 99
Correlation of Estimates 99
Crosstab Format 40
Crosstab Transposed 41
D
Daganzo Trip.jmp 122
Factor Analysis platform
By variable 72
Freq variable 72
Weight variable 72
Firth Bias-adjusted Estimates 131
Free 35
Free Text 32
Freq variable 72
I
Indicator Group 34
IRT 145
Item Analysis Platform 150
options 155
overview 147
report 152
Item Response Theory 145
J
JMP Starter 23
Joint Factor Tests 99, 107
L
Laptop Profile.jmp 102
Laptop Runs.jmp 103
Laptop Subjects.jmp 105
Likelihood Ratio Tests 98–99
Lung Cancer Responses.jmp 129
Lung Cancer.jmp 131
E
Effect Marginals 99, 107
Eigenvalues 77
M
Maximize for each Grid Point 93
176
menu tips 22
Model Dialog 102
Multiple Delimited 34
Multiple Response 34
Multiple Response by ID 34
N
Number of factors 74
Number of Plots Across 152
O
oblique transformation 75
orthogonal transformation 75
P
Pizza Choice Example 88–120
Pizza Combined.jmp 113
Pizza Profiles.jmp 89, 115–116
Pizza Responses.jmp 89–90, 115–116
Pizza Subjects.jmp 115
Principal Components 74
Profile Data 94
R
Rater Agreement 34
Repeated Measures 34
Response data 95
Response Frequencies 35
Responses 34
Rotation method 75
S
Save Gradients by Subject 102, 115
Save Utility Formula 102
Scree Plot 77
Structured 35
Subject data 97
Supercategories 37
T
Test Each Response 42
Test Response Homogeneity 42
Index
Consumer Research
tooltips 22
Transition Report 43
tutorials 21
U
Uplift Model 135
overview 137
platform 139
report 140
report options 142
W-Z
Weight variable 72, 161
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement