Repeat question analysis example
This introduces repeat questions in Market Research data analysis using QPSMR Companion.
For this introduction the questionnaire is very short to make it easy to check the tables produced. The questionnaire relates to shopping habits and interviewers administer it to shoppers at the end of the day. Interviewers ask shoppers about their first two purchases and then about the first two drinks they consumed.
________________________________________________________________________ NAME. Name | ------------------------------------------------------------- | ________________________________________________________________________ Q1A1. First purchase - Type of outlet visited | | Department | 1 | Specialist store | 2 | ________________________________________________________________________ Q1B1. First purchase - Type of goods purchased | | Clothes | 1 | Other products | 2 | ________________________________________________________________________ Q1A2. Second purchase - Type of outlet visited | | Department | 1 | Specialist store | 2 | ________________________________________________________________________ Q1B2. Second purchase - Type of goods purchased | | Clothes | 1 | Other products | 2 | ________________________________________________________________________ SEX. Gender | | Male | 1 | Female | 2 | ________________________________________________________________________ Q2A1. First drink - Type of drink consumed | | Tea/coffee | 1 | Other drink | 2 | ________________________________________________________________________ Q2B1. First drink - Number of drinks consumed | |__| | ________________________________________________________________________ Q2A2. Second drink - Type of drink consumed | | Tea/coffee | 1 | Other drink | 2 | ________________________________________________________________________ Q2B2. Second drink - Number of drinks consumed | |__| |
This questionnaire contains two sets of repeats – Q1 and Q2. The gender question is between Q1 and Q2 to separate them. To make it clear that they are separate questions and not related to each other.
Interviewers leave these questions blank if shoppers didn’t make a second purchase or had a second drink.
In order to keep the data simple and to facilitate the checking of tables against the original questionnaires, we are only going to have 4 data records, 2 men and 2 women. We are assuming that these 4 people are representative of all shoppers.
Serial number 1 NAME Rose Q1A1 1 Department Q1B1 1 Clothes Q1A2 1 Department Q1B2 2 Other products SEX 2 Female Q2A1 1 Tea/coffee Q2B1 1 Serial number 2 NAME Julia Q1A1 2 Specialist store Q1B1 1 Clothes Q1A2 1 Department Q1B2 2 Other products SEX 2 Female Q2A1 1 Tea/coffee Q2B1 1 Q2A2 2 Other drink Q2B2 2 Serial number 3 NAME Andy Q1A1 2 Specialist store Q1B1 2 Other products SEX 1 Male Q2A1 2 Other drink Q2B1 5 Serial number 4 NAME Frank Q1A1 2 Specialist store Q1B1 1 Clothes Q1A2 2 Specialist store Q1B2 1 Clothes SEX 1 Male Q2A1 1 Tea/coffee Q2B1 2
Rose is a busy mother who visited department stores to buy some clothes and some other products and had a quick cup of tea. Julia bought designer clothes from a specialist store and some electrical goods from a department store. She also had time for a morning coffee and later 2 other drinks. Andy is a student and he bought a book from a specialist store and spent the rest of the day in the pub. Frank visited specialist stores and bought two lots of clothes. He stopped twice for a coffee.
From these 4 records we can see that there were:
- 3 purchases from department stores and 4 from specialist stores.
- 4 purchases of clothes and 3 purchases of other products.
- 3 tea/coffee and 7 other drinks consumed.
Analysis of Q1A questions
We are now going to look at the analysis of Q1A Type of outlet visited. There are two repeats of this question Q1A1 and Q1A2.
There are two ways to analyse this question – we can base tables on individuals or on purchases.
A table based on individuals with gender as the breakdown (banner):
Table 1 Q1A combined Base: All respondents Total Male Female Total 4 2 2 Department 2 - 2 50% -% 100% Specialist store 3 2 1 75% 100% 50%
This table has four respondents, two of each sex. In our data men only bought from specialist stores. Both women bought from department stores and one also bought from a specialist store.
The rows (Q1A combined) are made by creating a multi-coded variable (called VQ1A) and using [Block insert] to [Or together] Q1A1 and Q1A2.
The same table, based on purchases:
Table 2 Type of outlet visited Base: All purchases Total Male Female Total 7 3 4 Department 3 - 3 43% -% 75% Specialist store 4 3 1 57% 100% 25%
This table has the same rows and columns as table 1 but is now based on the 7 purchases. In our data all three purchases by men were in specialist stores. Women made 3 purchases in department stores and 1 purchase in a specialist store.
This table is in fact two tables. The first table has Q1A1 as the rows and the second has Q1A2 as the rows, and the second table overlays onto (added to) the first table. The first table’s row title text is different, with applied filters to get the correct base text.
Because some people may not have a second repeat, we should also filter the tables on the question not being blank. Without these filters the base would be 8.
So which is the correct table – table 1 or table 2? The answer is that they are both correct. It all depends whether you are more interested in people or purchases.
Analysis of Q1A & Q1B
We are now going to produce a table of Q1A by Q1B. These are two related repeats – we will compare the answer to the Q1B repeat to the answer of the relevant Q1A repeat.
We can now produce four tables:
Table 3 Q1A combined by Q1B combined Base: All respondents Total Clothes Other products Total 4 3 3 Department 2 2 2 50% 67% 67% Specialist store 3 2 2 75% 67% 67% Table 4 Type of outlet visited by Type of goods purchased Base: All purchases Total Clothes Other products Total 7 4 3 Department 3 1 2 43% 25% 67% Specialist store 4 3 1 57% 75% 33% Table 5 Type of outlet visited by Q1B combined Base: All purchases Total Clothes Other products Total 7 6 5 Department 3 3 3 43% 50% 60% Specialist store 4 3 2 57% 50% 40% Table 6 Q1A combined by Type of goods purchased Base: All purchases Total Clothes Other products Total 7 4 3 Department 4 2 2 57% 50% 67% Specialist store 5 3 2 71% 75% 67%
These four tables (3, 4, 5 and 6) have the same rows and columns as each other. The first is based on people and the other 3 are based on purchases. The figures are different in all four tables but which one is correct? Again, they are all correct, although some are more correct than others.
Table 3 is a valid table and shows shopper profiles. This table is people based, and for shoppers who bought clothes and other products. It shows where the shopper purchased things (not necessarily the clothes or other products).
It must be clearly understood that this table using combined (“Or together”) variables does not show where clothes and other products were purchased.
Normally the program would produce table 4. This is based on purchases and does show where clothes and other products were purchased.
This table is Q1A1 by Q1B1 with an overlay of Q1A2 by Q1B2. Where two related repeated questions are tabulated, it is normally correct to overlay each repeat on top of each other and base the table on the number of actual repeats.
Table 5 is overlaid but uses the combined variable as the columns. In addition, it shows the number of purchases made in the store types by shoppers purchasing clothes and other products (not necessarily in the store types shown).
Table 6 is overlaid, but uses the combined variable as the rows. It also shows the number of purchases of clothes and other products, and what types of store the shopper bought goods in (but not necessarily clothes or other products).
You should avoid tables such as 5 and 6.
To clarify the meanings of these tables we will look at the first cell in the body of the table – Clothes column and Department row:
This cell shows that of the 3 shoppers who bought clothes (Rose, Julia, Frank), 2 of them bought something in a department store (Rose, Julia). Note that Julia did not buy any clothes in the department store – she bought Other products.
This cell shows that of the four clothes purchases, only 1 was purchased in a department store.
This cell shows that of the 6 purchases made by shoppers who bought clothes (Rose 2, Julia 2, Frank 2), three of those purchases were in department stores.
This cell shows that of the 4 clothes purchases (Rose 1, Julia 1, Frank 2), 2 of these were by shoppers buying something in a department store (Rose 1, Julia 1).
Summary so far
If a repeated question is by itself we can choose whether to:
- Make a summary of the repeats and produce one table from it.
- Overlay the repeats on top of each other as tables 2 and 4 above.
For repeat questions which involves more than one related repeat question, it is nearly always correct to use overlay tables. The base then becomes not the number of questionnaires, but the number of repeats. Furthermore, always remember to filter tables on valid repeats only and to change the base description to the correct base.
When you produce overlay tables on repeat questions, the golden rule is to ensure that each tables’ rows, columns, filters and weights are correct for the repeat. For example, for the table of the second repeat, if you use definitions of variables and filters on the table, the second repeat’s must be correct.
Analysis of Q2
We are now going to look at the analysis of Q2. We can combine Q2A1 and Q2A2 as before, by creating a multi-coded variable with [Block insert] [Or together]:
Table 7 Q2A combined Base: All respondents Total Male Female Total 4 2 2 Tea/coffee 3 1 2 75% 50% 100% Other drink 2 1 1 50% 50% 50%
Alternatively, we can overlay the tables:
Table 8 Q2A1. First drink - Type of drink consumed Base: All drinking occasions Total Male Female Total 5 2 3 Tea/coffee 3 1 2 60% 50% 67% Other drink 2 1 1 40% 50% 33%
Table 8 is an overlay. Note the base description on table 8.
We could combine Q2B by adding together the drinks for one shopper. In order to do this we use an integer variable and use an arithmetic definition to add drinks together $Q2B1+$Q2B2:
Table 9 Q2B combined Base: All respondents Total Male Female Total 4 2 2 5 (5.0) 1 1 - 25% 50% -% 3 (3.0) 1 - 1 25% -% 50% 2 (2.0) 1 1 - 25% 50% -% 1 (1.0) 1 - 1 25% -% 50% Mean score 2.8 3.5 2.0
Table 9 is a “list all rows” from the combined variable. In this case, we can see that all four people drank a different number of drinks in total.
Another way to present this table is to use the combined number of drinks as a quantity weight on the table:
Table 10 Base: All drinking occasions Total Male Female Total 11 7 4
Table 10 has no rows but is quantity weighted showing that men drank 7 drinks, whilst the ladies drank 4. This makes 11 drinks in total.
If we want to know which of these 11 drinks were tea/coffee or other drinks we need to tabulate Q2A by Q2B. We could run a table using Q2A as the rows quantity weighted by Q2B:
Table 11 Q2A combined Base: All drinking occasions Total Male Female Total 11 7 4 Tea/coffee 6 2 4 55% 29% 100% Other drink 8 5 3 73% 71% 75%
The total column shows that of the 11 drinks, 6 were tea/coffee and 8 were other drinks. This table is valid, but confusing – it shows that 6 drinks were consumed by shoppers who drank tea/coffee (Rose, Julia, Frank) and 8 by Shoppers who drank other drinks (Julia, Andy). Note that Julia is in both rows because she drank both types of drink and has been counted as 3 drinks in both rows because she drank 3 drinks in total.
The problem with table 11 is that we are combining related repeats Q2A and Q2B in the table and should have used overlays. Here is the overlaid equivalent using the Q2A1 weighted by Q2B1 overlaid with Q2A2 weighted by Q2B2:
Table 12 Q2A1. First drink - Type of drink consumed Base: All drinking occasions Total Male Female Total 11 7 4 Tea/coffee 4 2 2 36% 29% 50% Other drink 7 5 2 64% 71% 50%
This table looks much better and shows the drinks as expected. It would be possible to construct other tables with the same rows and columns using combined variables on overlay tables as we did earlier – these would be valid but not useful.
A final warning
It should be clear that it is easy to produce valid, but misleading, tables from repeated questions and careful thought is needed when producing them. Problems usually arise because users produce tables without consideration of the base and when it is safe to use combined variables.
An easy mistake in our simple survey would be to produce tables combining drinks with purchases without due care. For example, if we wanted to produce table 12 with purchase types (Clothes, Other) as the break instead of Gender, then we would have to use the combined Q1B as the columns. We must not use Q1B1 on the first overlay and Q1B2 on the second overlay because Q1 and Q2 are not related.
It is very important to state clearly on the tables what the base is. The easiest way to achieve this is to define a filter with the correct text and apply it to the first overlay table. It may be possible to attach the correct base text to a filter applied to the first repeat; in this case the program will automatically use it on overlaid tables.
The end users of tables do not always understand why the base is not all questionnaires on overlaid tables and ask for the base to be “put right”.
Sometimes it may be necessary to suppress the total column in overlaid tables (using format option NPTC) to avoid confusion. It would never be correct to suppress the total row if the table contains vertical percentages (as in all the tables above). This is because quality standards state that we must show the base for percentages.
Sometimes the Total base is wrong because the overlaid tables for each repeat have not been filtered correctly. Each overlay table must only include valid repeats where there is data for the repeat in question.
If you cannot persuade clients to accept tables with repeats as the base, it may be necessary to do accumulations separately to use on tables. For example, in our survey it would be possible (but lengthy) to accumulate the tea/coffee drinks into one variable for each respondent and in a separate variable the number of other drinks. These variables can then be used on other unrelated questions to get tables based on particular types of drink. If there were many different types of drink this could be very lengthy indeed.
If a set of questions appear 4 times on the questionnaire, then every table for these questions will normally consist of four overlaid tables (1 table followed by 3 overlaid tables). So, here is our second golden rule:
The number of overlays (including the first) is always the same as the number of repeats in the project.
Confusion sometimes occurs because each respondent only uses a limited number of the repeats. For example, in a product test there may be four repeated sets of questions, but individual respondents will always answer only two of them. Using the second golden rule above, it is right to use 4 overlays for all tables even though you will not use 2 of them. These will be empty for each respondent.
Another common source of confusion is where respondent details and repeat details are mixed together in a breakdown. In this case you need a new breakdown variable for each repeat. The respondent detail columns are the same in all the breakdown variables. The columns from the repeated information will vary depending on the repeat being processed. Finally to repeat earlier advice:
When producing overlay tables on repeated questions the golden rule is to ensure that the rows, columns, filters and weights are correct for the repeat being done.