Rachel called Pius, a secondary schoolmate of hers. Pius was working in the Census and Statistics Department, so she knew that he was the perfect person to answer her questions.
Pius and Rachel met on the next day at a cafe, while Pius introduced to her some of his favourite tea. Rachel then lamented on what she had come across – the missing information of the data collection.
"Oh, I see. This must be the problem due to missing data." Pius hypothesized, "Missing data refers to unknown values or absences in the values of a data collection. They could greatly affect your report if you do not treat them in the correct way.”
"Oh right..." she rolled her eyes. "So what are the exact reasons for the occurrence of missing data? My colleagues only explained with vague reasons like 'the participants do not want to reveal their weight'. "
"Well, this is precisely why missing data occur!" Pius replied,
"missing data is caused by the respondents themselves. For example, if one person was too busy to come to the centre for the two whole weeks, and another refused to reveal his weight, there would be no stored data in the corresponding collection. What do you expect about their results then, Rachel? "
93
"I have no idea about those who did not turn up at the fitness centre, but for those who did not answer, I guess their heights or weights are usually more embarrassing." said Rachel.
After some discussion, they generated some more possible reasons for missing data:
Possibilities of Missing Data: Missing not at random (NMAR) What is meant by NMAR is that the data is missing because of the quantity it wants to collect (i.e. the dependent variable itself). In this case, it is more specifically the heights and weights of the participants. Since NMAR is caused due to certain preferences in giving away data from participants, the remaining data is usually biased.
What if they are really obese and do not wish to give away their weights such that they could avoid being teased at?
Or is it that they do not want to leave their sedate lifestyle so they deliberately left their ‘weight’ blank?
Could it be that they are so short that they fear showing the other people their real height?
All of the above speculations are due to the preferences of the participants. Therefore, the data is missing not at random.
94
Possibilities of Missing Data: Missing completely at random (MCAR)
This means that the data was missing not in relation to the independent and dependent variables.
The participants did not turn up at the fitness centre, but there is no explanation why. However, this leads to missing data since they should originally turn up but they did not.
Another possibility is that Rachel’s colleagues forgot to take the heights and weights of the participants. Therefore after the fitness programme, the row for heights and weights was left blank.
All these data were missing not due to the bias of the participants, but due to the setting and the external conditions.
Often, MCAR data are less significant in creating a bias; so somehow they tend to be omitted.
Source: Missing Data and How to Deal: An overview of missing data by M. Humphries
Rachel then talked about the missing data in the questionnaire.
Handing to him the questionnaire she designed, she explained," I have also asked guiding questions to find out how people develop
95
obesity. Nevertheless, most respondents simply left them blank!"
"Now I understand.” With a cheeky smile, he responded, “Not saying that I have perfect hindsight, but you should have told them the aim of the investigation is to provide a tailor-made
course, or even provide discounts for them. That way, they
would surely accept the questionnaire. Also, instead of asking your colleagues to inquire the clients’ height and weight, you can simply measure them directly. Not only all the data -- if the person shows up, that is -- can be collected, you can also reduce the errors arising from variances between each balance or ruler.Besides, you can also add a ‘others’ in list questions instead of giving rigid closed answers or open-ended answers. This way, they will have no excuse to say that they have no choice in the
6. How much time do you usually spend in sports per week?
≥4 hours 3-4 hours 2-3 hours
1-2 hours ≤1 hour
7. What kind of sports do you usually take? (please list):
8. How many times do you usually take in snacks per day?
More than three times three times
twice once never
9. What kind of food would you take in as snacks?
Sweets Seaweed Potato Chips Biscuits
96
answers.”
With a slight pause to make sure Rachel understood, he proceeded: "Open-ended questions frequently receive blank answers, right? This is also another reason why missing data occur. You cannot sort them, but they are still counted. Moreover, for question 9, if the respondents failed to find a suitable option, or if they had answered ‘never’ in question 8, the answer would be blank. This is a default in the design of the questionnaire, more commonly known as ‘skip patterns’, and it is sometimes inevitable because it is applicable only to some people."
9. What kind of food would you take in as snacks?
Sweets Seaweed Potato Chips Biscuits
Others (please specify: __________________)
97