I have been given a data set containing 240 data items containing information relating to driving tests
My aim is to investigate what factors influence a successful outcome from different fields contained in the data sheet:* Gender of the Driver* Number of 1hr lessons* Number of minor of mistakes* The driving instructorMost of the fields shall be investigated to see if there is any pattern connected with successful drivers apart from.* The Day and Time the test was taken outInitial Analysis: The Initial Analysis of the entire data set shows that there is:* 116 Male drivers.* 124-Females> 60 Learners for Instructor A> 100 Learners for Instructor B> 40 Learners for Instructor C> 40 Learners for Instructor DThe Mean number of:* Minor mistakes are 16.78* 1Hour Lessons is 23.03Hypothesis One: For this data set Men on average made fewer minor mistakes than women in their driving test.Planning: In order to investigate this hypothesis I will take a random sample of 30 male drivers and 30 female drivers. And compare the number of minor mistakes they made in their driving test. To make this comparison I will construct box and whisker diagrams.Sampling: To get random samples from the data set so I can keep this a reliable investigation by using the box and whisker diagrams I shall- Use Microsoft Excel, to begin I will number the males and females in the data set, now I can set random numbers for each driver. Between 0-1 and then select the 1st 60 drivers from each gender.Statistics boxData StatisticsMale sampleFemale SampleLower Quartile11.7513Median17.517.5Upper Quartile2626.5Mean18.3519.13Range3532Inter Quartile14.2513.5Analysis of Hypothesis 1: From using autograph I found that my 1st hypothesis agreed with the results of the diagrams, though the results were very similar for both Genders. From my statistics box we can see just how close the results were, leading me on to my next hypothesis to help me determine why this isHypothesis 2: This Data Set; Male Drivers take more 1hour lessons on average than females do for their driving tests.Planning: For this Hypothesis the samples obtained for Hypothesis 1 shall be kept the same to make this a reliable and fair investigation. I will also use the same diagrams (Box and Whisker Diagrams) to represent the value for how many 1hour lessons each Gender took before their test.Statistics boxData StatisticsMale sampleFemale SampleLower Quartile17.7516.75Median2521.5Upper Quartile29.7529Mean24.7823.36Range3429Inter Quartile1212.25Analysis of Hypothesis 2: In this sample, men on average did take more 1hour lessons than the females did. This could explain why the male samples in hypothesis 1 had a higher success rate than the females. In conclusion of looking at these results and the 1st hypothesis’ I would like to investigate the relationship between the two.Hypothesis 3: In this data set the more 1hour lessons taken by a driver previous to their test is on average to make less minor mistakes in the driving test.Planning: For this Data set I shall take a sample of 30 drivers from each gender and will plot scatter-graphs of Number of 1hour lessons against the number of minor mistakes and will see if there is a pattern or correlation to give me a result for my hypothesis.Statistics boxData StatisticsMale sampleFemale sampleMean of x22.7323.37Mean of y15.719.13Standard Deviation of x9.6788.573Standard Deviation of y9.2538.469Correlation Coefficient-0.5931-0.3166Centroid22.73, 15.723.37, 19.13Analysis of Hypothesis 3: From my scatter graphs I can see that there is a negative correlation of each graph. The correlation coefficient (negative in both graphs) is not as strong in the female samples, which would represent that the males with more 1hour lessons would benefit more than the females with more 1hour lessons. Now that the relationships of both influences from hypothesis 1 + 2 have been investigated I would like to decide what the rest of the factors do to contribute to the success rate.Hypothesis 4: Do certain driving instructors produce more successful drivers than others do?Planning: For this hypothesis, I will use stratified samples of learners from every instructor and put each in box and whisker diagrams to compare them and find which instructor has the greatest amount of successful learners. I will also use these Samples to make Scatter Graphs, to see the results in an easier manner.Stratified Samples:Instructor A- 60 Learners = (60/240)120 = 30Instructor B- 100 Learners = (100/240)120= 50Instructor C- 40 Learners = (40/240)120 = 20Instructor D- 40 Learners = (40/240)120 = 20Statistics boxInstructor’sData StatisticsABCDLower Quartile8.75913.2521Median111520.825Upper Quartile142229.2529.5Mean10.815.620.823.5Range18323232Inter Quartile5.2513168.5Data StatisticsInstructor AInstructor BInstructor CInstructor DMean of x16.724.225.6825.55Mean of y12.517.4117.1826.77Standard Deviation of x6.18.258.628.71Standard Deviation of y6.658.549.985.63Correlation Coefficient-0.7397-0.4236-0.9472-0.961Centroid16.7, 12.525.55, 26.7725.68, 17.1825.55, 26.77Statistics boxAnalysis of Hypothesis 4: From comparing the Box and Whisker Diagrams and their statistics, I can see that Instructor A has the most successful amount of drivers whilst Instructor D seems to have the worst success rate but as Box and Whisker Diagrams can produce less accurate results, I decided to further this Hypothesis by using scatter graphs. Funny enough the results collected from my scatter graphs were different from the Box and Whisker Diagrams, showing that Instructor C had the highest success rate and B had the lowest success rate. I know this as the gradient was much steeper and the correlation coefficient was more negative than any other of the scatter graphs. What may have influenced the box and whisker diagrams to have different results than my scatter graphs may have been that each sample was of different size. Instructors A had 30 drivers and B had 50 drivers whilst Instructor C + D had only 20 drivers meaning these instructors had less chance of either failing drivers than Instructor’s A + B would have.Hypothesis 5: In this Data Set, some Instructors do better/worse with the other Gender.Planning: This shall be my final Hypothesis, so I intend to make a reliable investigation- by taking a sample of 15 Males and 15 Females from each Instructor in the field of “minor mistakes” and creating Box and Whisker Diagrams with these samples. Once this has been done I will use the samples again and create Scatter Graphs for each. Examining the Number of Lessons each driver took against the number of Mistakes they made for each instructor and writing an analysis of each below them.Statistics box: Female SamplesInstructor’sData StatisticsABCDLower Quartile510513Median11191125Upper Quartile14231428Mean9.86179.8621.86Range18301831Inter Quartile913915Female SamplesInstructor’sData StatisticsABCDLower Quartile911513Median11151925Upper Quartile14222728Mean11.4616.261921.86Range14293231Inter Quartile5112215Statistics box: Male SamplesMale SamplesFemale Statistics boxData StatisticsInstructor AInstructor BInstructor CInstructor DMean of x16.4721.1323.2725.24Mean of y9.8671719.1326.71Standard Deviation of x5.5968.7327.818.447Standard Deviation of y4.8978.3598.745.592Correlation Coefficient-0.09017-0.1891-0.9548-0.96Centroid16.47, 9.86721.13, 1723.27, 19.1325.24, 26.71Female Samples from each InstructorThe correlation coefficient shows a very weak negativeCorrelation. The scatter graph represents barely any relationship between the number of lessons and number of mistakes.The correlation coefficient shows a weak negative correlation. This scatter graph suggests that there is not much of a pattern between the results.The correlation coefficient shows a strong negative correlation between the results. From this scatter graph I can see that the more lessons a driver take the less minor mistakes they make during their test.In this scatter graph the correlation coefficient is a strong negative correlation. Showing us that the more lessons a driver takes before their test the better the chance of success.Male Statistics boxData StatisticsInstructor AInstructor BInstructor CInstructor DMean of x16.382325.7328Mean of y11.5615.3117.1322.5Standard Deviation of x4.5549.16511.027.26Standard Deviation of y3.2978.32711.798.508Correlation Coefficient-0.1473-0.7534-0.9613-0.6498Centroid16.38, 11.5623, 15.3125.73, 17.1328, 22.5Male Samples from Each InstructorThis scatter graph has a very weak negative correlation coefficient. This means I won’t be able to say whether the number of lessons affects the number of minor mistakes made. This is also the same for the female drivers.The results of this scatter graph are that there is a Strong negative correlation between the number of lessons and the number of mistakes suggesting the more lessons the higher the success rate. These results differ from the Female drivers who had a weak negative correlation.Instructor C has a scatter graph with a strong negative correlation which suggests that this instructor is good at teaching female and male drivers as both had a strong negative correlation. This is my final scatter graph and it is from the male samples. Here we can see a moderate negative correlation which is almost the same as the female sample of Instructor D which had a strong negative correlation.Analysis of Hypothesis 5: The results I gathered from the Box and whisker Diagrams in the female samples were that Instructors A + C had better results with a smaller range of results and Instructors B + D were represented as the worst Instructors with a much wider range of results. For the Male Samples Instructor A had quite clearly the best results with a small range of results whilst Instructors C + D had the worst C having a very wide range of results showing to me that you would have a 50/50 chance of success with this Instructor. When I review all the Instructors together and both Genders I have decided Instructor A has the best success rate in both genders showing that it doesn’t matter whether you are male or female you will more likely pass with this Instructor than any other. From my review of Instructor C he/she seems to work better with Females than Males and Instructor D seems to have a high rate of failing drivers this would suggest you won’t have the best chance of succeeding with this Instructor. Finally Instructor B seems to be quite average when it comes to the genders and it appears not to affect whether or not you would succeed.From analysis of both the scatter graphs and box and whisker diagrams I have to come to the conclusion that both show a different side to the story and can represent the results in a different manner influencing what I think of each Instructor. This has left me to decide that leaving the analysis of both the box and whisker diagrams separate the safest option as so my conclusion of the Instructors is not completely non existent and that is why my analysis of the scatter graphs is below them and the box and whisker diagrams analysis above in the analysis section.Probability: In this section I will be counting the number of males and females that passed there test and then see how many passed or failed there test with each instructor. Then whenever I calculate how many males and females have passed I can take that number and use it and the total number who took the test and work out the probability in which males and females have of passing with each instructor as a whole and then I can find the probabilities that males and females will pass with each instructor. So here as follows are the probabilities of Drivers Passing there test with each Instructor from both Genders.
There's a specialist from your university waiting to help you with that essay.
Tell us what you need to have done now!