Experimental/Null Hypotheses

The problem that we are researching is whether or not it is dangerous to talk on a cell phone while driving.  Worded differently, we want to know if cell phone use has a significant effect on accident rates.

Experimental hypothesis: Cell phone use has a significant impact on the likelihood of an accident occurring.

Null hypothesis: Cell phone use does not have a significant impact on the likelihood of an accident occurring.

Leave a Comment

Summary post-Assignment 3

Generating Random Numbers

In order to obtain random numbers, I used Python IDLE, a program development tool, to create a program (Zelle 2004) that simulated a chaotic function; the input of a number between 0 and 1 would lead to an apparently random selection of decimal numbers. I allowed decimals ending in odd numbers to be males, and decimals ending in even numbers to be female. Using this program I generated 100 random numbers. According to the data from this program, three odd decimals, or “boys”, occurred in succession eleven times. Out of a sample of 100 mothers, there is a 11% probability that you will select a mother who had three boys in succession. The proportion of boys and girls was approximately 60% to 40%, respectively. This deviation from the 50% “norm” could be attributed to the computer program I was using. Since a chaotic function is used to generate these numbers, they are not truly random; no set of numbers that is generated by a calculator or computer program is, for this reason. Along this same note, the proportion I obtained could be influenced by the specific number that I chose as my input variable for this program. Additionally, the sample size I generated was not very large; with a larger sample size, the proportion of odd decimals to even decimals would become closer to 50/50. If I generated 10,000 numbers, the proportion of “boys” would probably approach 50%. This is true because according to the Law of Large Numbers, the closer your sample size is to “infinity”, the closer the proportion of desired outcomes to total outcomes will be to the statistical probability of the desired outcomes happening. (Stark 2005)

Strengths and Weaknesses: One strength of this method of data collection is that it is relatively free of human error.  That is, providing the program is used correctly, it is not easy for a user to obtain data that is “wrong” or not truly representative.  However, human error is still a possibility.  I determined the number of times three “boys” occurred by printing out the numbers I generated and marking the number of times this happened with a highlighter.  Although I was very careful to be attentive to what numbers I was highlighting, and to be observant about which numbers were “boys” and which were “girls”, it is possible that I either mistakenly highlighted some numbers, or mistakenly omitted some occurrences of three “boys” in a row.

Relating Assignment to the Topic from Class

In class, we discussed using the normal curve to obtain the probabilities of certain events happening.  The area under the normal curve is 100%.  We can calculate areas under the normal curve using Z-scores and a chart showing which Z-score values correspond to different areas under the normal curve (Mac Ewen 2008).  In experiments, these areas can function as probabilities.  In the “oil change problem” I used this idea to help me calculate the percentage of the population that waited longer to get their oil changed than I did by calculating the appropriate area under the normal curve. 

Law of Large Numbers Personal Example: Why we make mistakes
Out of the seven people I know from my high school who attend Virginia Tech, five of the seven are engineering majors while the other two are majoring in the biological sciences. This caused me to say to one of my friends, “Oh so the most popular major at your school is engineering followed by biology.” In my mind, I decide that engineering must be over 50% while biology has to be close to 20%. However, according to statistics set forth by College Board (2008), 21% of students are Business/Marketing, 20% are Engineering, 9% are Family and Consumer Sciences, and 8% are Biology, followed by a few other majors. My experiences led me to severe errors where I completely ignored the possible appearance of a business major due to the fact that I had no friends attending Virginia Tech who were business majors. Also, I completely overemphasized the prevalence of both engineering and biology majors. However, if I collected a larger number of data (if I polled a larger number of students from different areas attending Virginia Tech) then I would have been better able to make a more accurate prediction. The Law of Large Numbers states that the larger the sample the less variability found in the data. Thus, you would get data that looked like the actual real distribution. When you have a smaller sample, systematic effects have a more drastic effect. When you get a larger sample it a smaller standard deviation because you get a lot more data points closer to the mean than just extreme data points. For example, two of the engineering majors I met through another of the engineering majors. Although at times we befriend individuals totally different from us, often friends share similar interests. So, the fact that I met those friends through that one friend had a systematic effect on my data (showing up more engineering majors). However, if I took more data that would not be a problem, unless I met all of my data points through that one friend.

Proportion of Males in our Class vs. Population

There are 6 males and 19 females in our lab section. The proportion of males in our class is 0.24. According to the National Center for Education Statistics, out of the 85,614 bachelor’s degrees in psychology conferred in 2004-05 19,000 of those degrees were awarded to men and so the proportion is 0.2219. The difference between the two proportions isn’t large at all. However, if we had a larger sample than just the lab section we might see a proportion even closer to the average due to the Law of Large Numbers (although since we do attend Mary Washington where the female population is much higher than the male population, we might see a proportion skewed by the higher female population and so the proportion would be much different from the average).

Added 2/4/08: After receiving the head count today from Dr. MacEwan the proportion for the class lecture is 0.1739 (8 out of 46). Thus, my guess that with a larger sample size we might see a proportion skewed by a higher female population at Mary Washington was supported with this evidence.

The following two pictures are illustrating the phenomenon of the higher female population at Mary Washington:

Picture of the ratio of males versus females

Oil Change: Did I wait too long?

The Z-score for the number of miles I waited is 0.93, less than 1 standard deviation from the mean of 3,258 miles. Additionally, according to the z-score table, 17.6% of the population has waited longer to get their oil changed than I have. (2008) That’s nearly 20%, which is a fairly substantial percentage. Many people waited longer than I did to get their oil changed, so I really did not wait all that long.

Sources

Mac Ewen, B. (January 30, 2008). Psychology 261. Class lecture. University of Mary Washington.

College Board. (2008). Virginia Polytechnic Institute and State University. Retrieved January 31, 2008, from http://collegesearch.collegeboard.com/search/CollegeDetail.jsp?collegeId=89&profileId=7

Math2.org. Z-distribution. Retrieved January 31, 2008, from
http://math2.org/math/stat/distributions/z
-dist.htm

National Center for Education Statistics. (2006). Digest of Education Statistics. Retrieved February 3, 2008, from http://nces.ed.gov/programs/digest/d06/tables/dt06_258.asp

Stark, P. B. (2005) The Law of Large Numbers. Retrieved January 30, 2008, from
http://www.stat.berkeley.edu/~stark/Java/H
tml/lln.htm

Van Rossum, G. IDLE: Python’s Integrated Development Environment, v. 2.5.1. 2007. http://www.python.org/idle/

Zelle, J. (2004). Python Programming: An Introduction to Computer Science Wilsonsville, Oregon: Franklin, Beedle and Associates.

Comments (2)

Probability of selecting Ms. Williams-

In order to obtain random numbers, I used a computer program that simulated a chaotic function; the input of a number between 0 and 1 would lead to an apparently random selection of decimal numbers.  I allowed decimals ending in odd numbers to be males, and decimals ending in even numbers to be female.  Using this program I generated 100 random numbers.  According to the data from this program, three odd decimals, or “boys”, occurred in succession eleven times.  Out of a sample of 100 mothers, there is a 11% probability that you will select a mother who had three boys in succession.  The proportion of boys and girls was approximately 60% to 40%, respectively.  This deviation from the 50% “norm” could be attributed to the computer program I was using.  Since a chaotic function is used to generate these numbers, they are not truly random; no set of numbers that is generated by a calculator or computer program is, for this reason.  Along this same note, the proportion I obtained could be influenced by the specific number that I chose as my input variable for this program.  Additionally, the sample size I generated was not very large; with a larger sample size, the proportion of odd decimals to even decimals would become closer to 50/50.  If I generated 10,000 numbers, the proportion of “boys” would probably approach 50%.  This is true because according to the Law of Large Numbers, the closer your sample size is to “infinity”, the closer the proportion of desired outcomes to total outcomes will be to the statistical probability of the desired outcomes happening.  (2005)

 Stark, P. B. The Law of Large Numbers. Retrieved January 30, 2008, from
http://www.stat.berkeley.edu/~stark/Java/H
tml/lln.htm

Leave a Comment

Summary post-Assignment 2

1. The mean is the measure of central tendency that is most influenced by outliers. If a data set includes a number that is either very large or very small, it will make the mean either much higher or lower than a more “normal” number would. For example, the mean of 3, 4, 5 and 6 is 4.5. The mean of 3, 4, 5 and 30, however, is 10.5. The outlier, 30, makes the mean disproportionately higher than it ought to be. Accordingly, unusually high or low temperature values will affect the mean more than they “should”. Errors in measurement, as well as systematic events such as taking a cold drink or a hot shower, probably caused these extreme values. We cannot rely on these extreme data values because they are “exceptional” cases and, as such, are not representative of normal temperature values. The correct average body temperature is 98.2. According to the article, the original value of 98.6 is incorrect because, for example, the thermometers used to obtain this value were unreliable, and this value was obtained 100 years ago.

Emily’s arithmetic average temperature is 98.01 degrees F. The difference between her temperature and the correct average temperature is 0.19 degrees F. The standard deviation given in the article is 0.73. Her body temperature is well within 0.73 degrees of the correct average temperature and, as such, is not particularly low because it is much smaller than the “average difference” between a random temperature value and the mean.

Sadie’s mean was 98.20588, while the correct body temperature average was 98.2. The difference was that Sadie’s body temperature was 0.00588 points higher. This shows that she does not have an unusual body temperature average. Also, if you look at her median and mode for the data, they were also 98.2. Thus her data show that her average was not only close to the mean (standard deviation was 0.73 and so Sadie’s average was well within the realm between the mean and the first standard deviation) but also that she had the most occurrence of 98.2 and the exact middle of her data was 98.2. This was a surprising discovery due to the fact that she didn’t feel her temperature was very stable throughout the week of data collection. Not only did she have have a severe cold but she spent most nights at work, where she often took her temperature while she was working (regardless of the strange looks she received from customers). So the fact that her temperature was not unique in the slightest is very interesting. Part of the explanation could be the fever reducing medication she took throughout the week that might have leveled out her temperature and kept it a little more stabilized.

Sadie’s data are not very representative of what her actual temperatures are. Throughout the data collection she was sick with a very severe cold and so not only was she taking medication (several different types of cold medicine although obviously not at the same time) but also her sleeping and eating patterns were thrown off. However, the interesting discovery is that her mean temperature (as well as the mode and median) was very close to the average correct temperature. Also, her standard deviation was half that of what the standard deviation for the correct temperature was. This means that her data on average deviated from the mean much less than normal, which is surprising as one might expect her temperature to deviate more wildly since this cold disrupted the pattern of her life so severely. However, one might take into account the fact that she was on a fever reducer, which may have leveled out her temperature. The mean of her data could be affected by the one outlier she had that raised the mean. Also, the medicine Sadie took may have regulated her body processes enough so that she received more temperatures closer to her mean. If she continued to take measurements throughout the year, she would have a more representative data collection. However, the data would have a higher standard deviation due to the fact that her temperature would fluctuate more and also, it would have a lower mean due to the fact that she usually has a lower temperature than that of 98.2. It’s usually around 97.8 or so when she takes it normally.

Emily’s temperature data are likely not very representative. Emily only took her temperature over the course of five days, often immediately after being exposed to extreme cold. Other systematic events such as eating very hot or cold food immediately before obtaining a temperature reading also had a likely impact on her data. If a substantial amount of the data are slightly “off”, the measures of central tendency, and standard deviation, will be “off” as well. Her data would probably be more accurate if she continued to take measurements throughout the semester, because the weather would become warmer and largely cancel out any temperature readings that were lower than normal. Additionally, the greater the number of readings, the more representative the mean will be. A larger sample size tends to mitigate the effects of outliers.

Sadie’s arithmetic average is 36.78104 degrees Celsius. Emily’s arithmetic average is 36.67708 degrees Celsius. We found these temperatures using the equation: Tc= (5/9)*(Tf – 32). To convert from Fahrenheit to Celsius, you subtract 32 from the Fahrenheit temperature and multiply by the fraction 5/9.

2. This relates to the statistical topic from class because as we see when we graph the data using a frequency distribution, the data follow a rough normal distribution. By finding the mode, median, mean, standard deviation, and variance we can show how well our data follow the average “correct” temperature data. This allows us to apply order to the chaos of our random data, which allows us to better predict where the next temperature will fall. By finding the mode, median, and mean we create an understanding of what the average temperature would be (and also how any outliers will affect the data especially by comparing the median with the mean). By finding the standard deviation and variance, we see how on average a temperature will deviate from the mean.

3. The following represent the mean, median, mode and standard deviation of Emily’s temperature values. Both the median and the mode are 98.3 degrees F. Interestingly, this value is even closer to the temperature that, according to the article, is the “true” average temperature (98.2) than her own average temperature. Had there not been an outlier in her data set (95.6), her average temperature would likely be very close to 98.3 degrees, given the frequency with which it appears in the data.

Calculated using a graphing calculator:

Mean: 98.01875

Median: 98.3(The calculator does not calculate mode automatically but I was able to obtain it from the ordered list I made on the calculator.)

Mode: 98.3

Standard deviation: (n-1) 0.729

Standard deviation (n): 0.717

Using SPSS:

Mean: 98.019

Median: 98.3

Mode: 98.3

Standard Deviation: 0.7293

Variance: 0.532

A histogram of Emily’s data: Emily’s histogram

The following is Sadie’s measures using a hand calculator and SPSS. As you can see, her data do not deviate very much from the mean (which is almost the same as the median and mode) except for the one outlier in her data. Attached is also the frequency distribution from SPSS.

Using a hand calculator:

Mean: 98.20588

Median: 98.2

Mode: 98.2

Standard deviation (using n-1): 0.37331

Variance (using n-1): 0.13936

Standard deviation (using n): 0.36778

Variance (using n): 0.13526

Using SPSS:

Mean: 98.2059

Median: 98.2

Mode: 98.2

Standard Deviation: 0.36778

Variance: 0.135

And this is what her frequency distribution showed: Sadie’s frequency distribution

As you can see her data were well grouped around the mean. Also it roughly shows a normal distribution (or at least the normal distribution model since it can’t ever be a true normal distribution since your temperature could never go on into infinity or negative infinity).

4. 24 January 2008. Standard deviation-Wikipedia, the free encyclopedia. Retrieved 24 January 2008 from http://en.wikipedia.org/wiki/Standard_deviation

Source: (for information about Fahrenheit to Celsius)

Converting between Fahrenheit and Celsius temperature scales. (2008). Retrieved January 28, 2008 from http://www.usatoday.com/weather/wtempcf.htm

Shoemaker, Allen L. (1996). What’s Normal? — Temperature, Gender and Heart Rate. Journal of Statistics Education, 4, 2. Retrieved 25 January 2008, from http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html

5. The data have some limitations. The presence of an outlier in Emily’s data (95.6) made her average temperature much lower than it would have been if the temperature had not been an outlier. Therefore, the mean that Emily calculated probably does not represent her true “average” temperature; it may, in fact, be significantly higher than the one obtained in this experiment. One strength of the data is that Emily obtained enough temperature values to provide a reasonable estimate of what her average temperature could be. While this is still not perfect, obtaining 32 temperature values provides a much more accurate approximation of her “true” average temperature than, say, a collection of 5 temperature values. The mean of a larger data set will be less affected by outliers than the mean of a smaller data set.

Sadie also had an outlier in her data (99.4). It slightly affected her mean, raising it to 98.20588, while her median and mode showed that 98.2 would have been a slightly more accurate representation of her data.

 Done by Sadie Tyree and Emily Vorek

Comments (5)

Emily’s Data Values: SPSS and Calculator

Calculated using a graphing calculator:

 Mean: 98.01875

Median: 98.3

(The calculator does not calculate mode automatically but I was able to obtain it from the ordered list I made on the calculator.)  Mode: 98.3

Standard deviation: (n-1)  0.729

Standard deviation (n): 0.717

 http://en.wikipedia.org/wiki/Standard_deviation

Using SPSS:

Mean: 98.019

Median: 98.3

Mode: 98.3

Standard Deviation:  0.7293

Variance: 0.532

Leave a Comment

Second Lab Assignment: Values for Sadie’s Data

Here are the three measures of central tendency and the measures of variance for Sadie’s data on temperatures using a hand calculator:

 

Mean: 98.20588

Median: 98.2

Mode: 98.2

Standard Deviation (using n-1): 0.37331

Variance (using n-1): 0.13936

Standard Deviation (just n): 0.13526

Variance (just n): 0.13526

 

Sadie’s arithmetic average temperature versus the correct average body temperature

My mean was 98.20588 while the correct body temperature average was 98.2.  The difference was that Sadie’s  body temperature was 0.00588 points higher.  This shows that she does not have an unusual body temperature average.  Also if you look at her median and mode for the data, they were also 98.2.  Thus, her data shows that while her average not only was close to the mean but also she had the most occurrence of that temperature and the exact middle of her data was that temperature.  This was a surprising discovery due to the fact that she didn’t feel her temperature was very stable throughout the week of data collection.  Not only did she have a severe cold but she spent most nights at work, where she often took her temperature while she was working (regardless of the strange looks she received from customers).  So the fact that her temperature was not unique in the slightest is very interesting.  Part of the explanation could be the fever reducing medication she took throughout the week that might have leveled out her temperature and kept it a little more stabilized.

Leave a Comment

Summary Post

A “random event” is an event that is impossible to predict because too many unpredictable factors influence its occurrence. The degree to which it will impact another event is usually unknown. A “systematic event” is a “non-random” event that has a predictable impact on another event. (D. Mac Ewen, personal communication, January 17, 2008) We cannot be sure how much a random event will impact another event, or if it will even impact it at all, whereas systematic events have a consistent, predictable impact on another event.

It is never possible to predict exactly what the next temperature value will be, because obtaining a particular temperature reading is an example of a random event. A nearly infinite number of factors exist which have an impact on body temperature, such as stress levels, ambient temperature, clothing worn, activity level, and so on. Because we cannot know if these factors will impact body temperature, and if so, to what extent, it is impossible to predict body temperature exactly. For similar reasons, it is impossible to predict any future event exactly, because too many factors, known and unknown, could exert an influence on its occurrence.

Some systematic events are present in the data. One of the systematic effects in our data can be found in Sadie’s data; during the experiment, Sadie was taking a series of cold medications. After Sadie took a dose of medication (one that included a fever reducer) her temperature would stabilize around one temperature, thus making it easier to predict. Another systematic effect impacted Emily’s data. Twice, she took her temperature immediately after eating; in one case, she had just eaten hot soup. Accordingly, her temperature was slightly more elevated than her usual “baseline” temperature.

Some sources of random variation that affect our lives include the weather, our exact fluid intake, how much sleep we have gotten, traffic, the amount of food we have eaten, fire drills and chance encounters with friends as we are on our way to or from an appointment. The weather, our fluid intake, our food intake, and the amount of sleep we have gotten affect our feelings of well-being to different degrees, depending on the circumstance. Our feeling of well-being, in turn, has an impact on our daily activities, my alertness and mood. Fire drills, traffic and chance encounters with friends influence whether or not we will arrive at a class or other appointment on time. This, in turn, has far-reaching implications for other aspects of our lives.

Our project relates to the concept of randomness because receiving a particular temperature reading is a random event. It is influenced by many factors which are beyond our knowledge and control, making it unpredictable and, therefore, random.

 

Sadie’s chart of daily temperatures

This graph illustrates Sadie’s temperature readings for the five-day period.

chart1.gif

This graph illustrates Emily’s temperature readings for the five-day period. The temperature readings tend to be relatively stable, without consistent variation that is correlated withthe times that the temperatures were taken. That is, temperature readings were not consistently higher or lower depending on the time of day. This could be attributed to random variations such as the variability of Emily’s sleeping and eating schedule. This could also be caused by systematic variations, such as coming in from the cold and immediately obtaining a temperature reading, and eating hot food just before obtaining a temperature reading.

Sources: Dr. D. Mac Ewen, personal communication, January 17, 2008.

Hirsch, Larissa. (2007) Retrieved January 20, 2008, from Kid’s Health for Parents. Web site: http://www.kidshealth.org/parent/general/body/fever.html

 

Experimental error exists in this experiment. One potential source of this experimental error is related to the accuracy of Emily’s measuring instrument (the thermometer) and of her method of using the thermometer. This thermometer is not an expensive, high-end thermometer and, as such, may not be as accurate as some. Additionally, Emily used the axillary (underarm) method of taking her temperature. Depending on the positioning of the thermometer, temperature fluctuations are likely to have occurred since, when using this method, it is impossible to position the thermometer the same way every single time. Finally, Emily began taking ibuprofen, a fever reducer, for muscle pain on Friday, which probably had an impact on the temperature readouts she received. A strength of her data collection is that her thermometer is a reasonably good one, and can be assumed to be relatively accurate. Furthermore, Emily tried to position the thermometer the same way each time that she took her temperature, making these erroneous temperature fluctuations less likely.

One of the weaknesses of Sadie’s data collection was that the day she started taking her temperature, she started showing the symptoms of a severe cold. Her eating and sleeping habits were disrupted, due to coughing and discomfort during sleep and eating. Also, she starting taking Sudafed PE, Alka-Seltzer Plus, and Tylenol Cold Multi-Symptom, all of which contained a fever reducer. This created a series of disruptions in her data, sometimes causing her temperature to fluctuate and then to level out when the fever reducer began working.

I hereby declare on my word of honor that I have neither given nor received unauthorized help on this work.

Comments (2)

First Lab Assignment: First edition of Sadie’s Chart

This is a chart with the first three days of my data collection. I will post a new chart when ISadie’s Temperature Chart have finished collecting with all the data points on it. I’m just posting this one to make sure I know how to do it and to get some feedback.

Comments (4)

First blog post: Experiment questions

1. (1) A “random event” is an event that is impossible to predict because too many unpredictable factors influence its occurrence.

(2) It is never possible to predict exactly what the next temperature value will be, because obtaining a particular temperature reading is an example of a random event. A nearly infinite number of factors exist which have an impact on body temperature, such as stress levels, ambient temperature, clothing worn, activity level, and so on. Because we cannot know if these factors will impact body temperature, and if so, to what extent, it is impossible to predict body temperature exactly. For similar reasons, it is impossible to predict any future event exactly, because too many factors, known and unknown, could exert an influence on its occurrence.

Some sources of random variation that affect my life include the weather, my exact fluid intake, how much sleep I have gotten, traffic, the amount of food I have eaten, fire drills and chance encounters with friends as I am on my way to or from an appointment. The weather, my fluid intake, my food intake, and the amount of sleep I have gotten affect my feeling of well-being to different degrees, depending on the circumstance. My feeling of well-being has an impact on my daily activities, especially whether or not I choose to exercise, as well as my alertness and mood. Fire drills, traffic and chance encounters with friends influence whether or not I will arrive at a class or other appointment on time. This, in turn, has far-reaching implications for the rest of my life.

2. Our project relates to the concept of randomness because receiving a particular temperature reading is a random event. It is influenced by many factors which are beyond our knowledge and control, making it unpredictable.

5. Experimental error exists in this experiment. It is related primarily to the accuracy of our measuring instrument (the thermometer) and of our method of using the thermometer. This thermometer is not an expensive, high-end thermometer and, as such, may not be as accurate as some. Additionally, I used the axillary (underarm) method of taking my temperature. Depending on the positioning of the thermometer, temperature fluctuations are likely to have occurred since, when using this method, it is impossible to position the thermometer the same way every single time. A strength of our data collection is that my thermometer is a reasonably good one, and can be assumed to be accurate, to a certain degree. Furthermore, I tried to the best of my ability to position the thermometer the same way each time that I took my temperature, making such inaccurate temperature fluctuations less likely.

Sadie–we still need to work on:

(1) How does a random event differ from a systematic event?

(3) Are there any systematic effects in our data and if so, what?

Comments (2)

Follow

Get every new post delivered to your Inbox.