Thursday, December 12, 2019

Descriptive Statistics for Nonparametric Models

Question: Discuss about the Descriptive Statistics for Nonparametric Models. Answer: Introduction: The dataset in this assignment contains the data about a coffee shop. The coffee shop has recorded the present status of his business. The dataset contains different variables like the id of the staff, time required by them to prepare the coffee, hot chocolate, Panini and others. The information about the customers is also given in this assignment. The aim of the assignment is to apply the statistical tools like the descriptive statistics measure and the inferential techniques such as hypothesis testing to the given dataset. The regression analysis is also to be performed in this dataset. The detailed analysis of the data has been discussed in this report. Discussion: The main variables in the dataset are the id of the staff, time taken by them to prepare Latte, hot chocolate, Panini, loyalty card number of the customers, time spend by them in the coffee shop, beverages purchased by them, number of sachet purchased by them, confectionery and soup or sandwich purchased by them . The variables can be of two types continuous and discrete. The continuous variables are those variables that take the value on a continuous scale. The discrete variables are those variables that take the values at some fixed or discrete points (Weiss and Weiss 2012). The following table gives an idea about the type of variables: Variables Type Staff id Discrete Time to prepare Latte Continuous Time to prepare hot chocolate Continuous Time to prepare Panini Continuous Customer loyalty card number Discrete Time spend by the customer Continuous Beverage purchased Discrete Number of sugar sachet purchased Discrete Confectionery purchased Discrete Soup or sandwich purchased Discrete The mean value and the median value have been calculated as the summary measure for descriptive statistics. The standard deviation and range has been calculated as the measures of variation for the variables. The measures have been calculated for the continuous variables. The mean value for time to prepare latte is 1.8779 while median is 1.89. Therefore, the mean and the median values agree closely indicating that the time to prepare latte is near about 1.8799. The variance value is 0.122513 and the range is 1.38 minutes. The mean value for time to prepare 3 is 1.601224 while the median is 1.605. The variance is also small that is 0.0792267 and the range is 1.43. Therefore, it can be said that the average time required to prepare 3 is 1.6 minutes. The average time to prepare Panini is 2.356122 while the median is 2.28. The variance is a bit high that is 1.182659 and the range is also high (5.3). The results conclude that there is large variation in the cooking time of Panini. Some of the staff requires much more time in preparing Panini than others. The average time required is around 2.30 minutes. The mean value of time spend by the customers in the restaurants is 20.06433 and the median value is 20.16. The range of observations is 19.45 and the variance is 29.56883. Therefore, the time spend by the customers in the coffee shop has a large variation in the values. The results of statistical analysis states that the mean value of time spent by the customers in the coffee shop is 20.06433 minutes. The above table gives the measures of descriptive statistics. The mean value is calculated by the following formula: = 1/n * xi. The median value is the middlemost observation of the set of observations when they are arranged in any order that is either ascending order or descending order. The mean and median values are the measures of central tendency. All the observations have a tendency to cluster around these values. The measures of dispersion are the variance measures and the range. The variance is given by the following formula: Var(x) = 1/n * (xi - )( Bickel and Lehmann 2012) The range is the difference between the smallest and the largest observation. The range gives how much the values of the variables are scattered among themselves. The standard deviation gives the dispersion among the variables (Kock 2013). The discrete random variable in this dataset is the number of sugar sachet purchased. The number of sugar sachet purchased can take any value from 0 to infinity. Therefore, the distribution of number of sugar sachet purchased follows a Poisson distribution. The probability mass function or p.m.f of the random variable is given as follows: In the given dataset, the variable x that is sugar price has taken the values 0, 1 , 2 and 3. However x can take any number of values as the customer can purchase any number of sugar sachet. Therefore, the distribution is Poisson distribution. The graph of the Poisson distribution obtained from this data is as follows: The above plot depicts the frequencies of the number of sugar sachet purchased. There is a very heavy weight age on the zero value as indicated by the figure. Therefore, the ideal distribution should be zero inflated Poisson distribution. This distribution separates the zero value and gives a higher weight age on the zero value. Then one can model the data for the analysis and decision making. It is clear from the values that most of the customers do not purchases any sugar sachet. The time to cook 3 is a continuous variable. The time to cook Latte is also a continuous variable. A t test has been conducted. The null hypothesis of the test is H0: 1 = 2 and the alternative hypothesis is H1: 1 2. The t-test has been conducted by assuming equal variances as the variance of both the datasets are equal. The p-value of the test obtained is 5.86 * 10^-9. The p-value is less than the given level of significance (0.05). Therefore, the null hypothesis of the test is rejection. The rejection of the null hypothesis implies that the mean time for cooking of the two products is different. The difference between the times to prepare Panini is also being tested. A test that could be conducted for this variable is that whether the mean value of the estimated time is equal to 2 or not. The null hypothesis of the test is H0: = 2 against H1: 2. The statistic of the test is given below: T = (x-)/sqrt(s/n) (Larson and Farber 2012) The value obtained from the sample data is 3.241773. The tabulated value of the t-statistic is 1.99 (appx.) at a degree of freedom 97. Therefore, the null hypothesis of the test is rejected. The mean value of the estimated time for performance is not equal to 2. The mean value can be greater than or less than 2. The next continuous variable is the time spends by the customer in coffee shop. The test that can be conducted for this purpose is to test whether the time spend by the customer in a coffee shop is equal to 20 minutes or not. The null hypothesis of the test is H0: = 20 against H1: 20. The test statistic for the test is given below: t = (x - )/sqrt(s/n) The value that has been calculated from the sample is -0.0648007. The tabulated value of the test statistic is 2.045 at degrees of freedom 29. Therefore, the null hypothesis of the test is rejected. Therefore, the mean value cannot be assumed to be equal to 20. The regression analysis has been done by taking the time to prepare Panini as the dependent variable and the time to prepare hot chocolate and latte as the dependent variables .( Refer to appendix 1). The value of adjusted R squared for the model is -0.00507.The adjusted R squared indicates the goodness of fit of the model. A value of adjusted R squared closer to 1 indicates that the model is a good fitted model in predicting the values. The regression is done mainly to predict the values of one variable from the other. This in turn explains the correlation between the variables. The model is therefore not a good fitted model in having minimum errors. The regression equation shows that both the time to prepare hot chocolate and latte has a positive association with the estimated time to prepare Panini. The regression coefficient for time to prepare 3 is much less. Therefore, this variable does not have much influence on the time to prepare Panini. The latte and Panini are prepared by the same staffs. The regression analysis results explains that those staff who takes longer time to prepare Panini also takes longer time to prepare Latte. Conclusion: In this report, the dataset contains different variables. At first, the variables are identified as the discrete or continuous variable. The discrete variables are those that take only integer values. The descriptive statistics measures have been obtained for the continuous variables. The inferential measures like test of hypothesis have also been conducted on the continuous variables. An independent t-test has been conducted on the time for rehearsal to test if the mean value differs among the time to prepare latte and the time to prepare 3. The t-test has also been conducted for the other continuous variables to check their mean values. A regression analysis has also been conducted for the dataset. A linear regression model has been fitted for the dataset taking the two times to prepare Panini as dependent variables and the time to prepare latte and time to prepare 3 as the dependent variable. The time to prepare latte is found to be dependent on the time to prepare latte. The reco mmendations that follow from the study are as follows: The result of the report suggest that in the coffee shop, those workers who take a longer time to prepare the Panini took a longer time to prepare Latte. Therefore, it is highly recommended from the study that the workers who lacks behind in their jobs should be given a proper kind of training so that they can do well in together job. This would help to increase the efficiency of workers which in turn will affect the sales. The number of sugar pouches purchased in this report contains many zero values. The distribution of the number of sugar pouches purchased follows a Poisson distribution. However, there is a very high weightage on the zero values. Therefore, the distribution of the variable can be assumed to be zero inflated Poisson instead of ordinary Poisson distribution. References: Bickel, P.J. and Lehmann, E.L., 2012. Descriptive statistics for nonparametric models IV. Spread. InSelected Works of EL Lehmann(pp. 519-526). Springer US. Bickel, P.J. and Lehmann, E.L., 2012. Descriptive statistics for nonparametric models I. Introduction. InSelected Works of EL Lehmann(pp. 465-471). Springer US. Bickel, P.J. and Lehmann, E.L., 2012. Descriptive statistics for nonparametric models. III. Dispersion. InSelected Works of EL Lehmann(pp. 499-518). Springer US. Boos, D.D. and Osborne, J.A., 2015. Assessing Variability of Complex Descriptive Statistics in Monte Carlo Studies Using Resampling Methods.International Statistical Review,83(2), pp.228-238. Kock, N., 2013. Using WarpPLS in E-Collaboration Studies: Descriptive Statistics, Settings.Interdisciplinary Applications of Electronic Collaboration Approaches and Technologies,62. Larson, R. and Farber, E., 2012.Elementary statistics. Pearson Prentice Hall,. Samuels, M.L., Witmer, J.A. and Schaffner, A., 2012.Statistics for the life sciences. Pearson education. Weiss, N.A. and Weiss, C.A., 2012.Introductory statistics. London: Pearson Education.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.