Browsing by Author "Ding, Shanshan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Comparison of Statistical Learning and Predictive Models on Breast Cancer Data and King County Housing Data(Department of Applied Economics and Statistics, University of Delaware, Newark, DE., 2017-09) Cai, Yunjiao; Fu, Zhuolun; Zhao, Yuzhe; Hu, Yilin; Ding, ShanshanIn this study, we evaluate the predictive performance of popular statistical learning methods, such as discriminant analysis, random forests, support vector machines, and neural networks via real data analysis. Two datasets, Breast Cancer Diagnosis in Wisconsin and House Sales in King County, are analyzed respectively to obtain the best models for prediction. Linear and Quadratic Discriminant Analysis are used in WDBC data set. Linear Regression and Elastic Net are used in KC house data set. Random Forest, Gradient Boosting Method, Support Vector Machines, and Neural Network are used in both datasets. Individual models and stacking of models are trained based on accuracy or R-squared from repeated cross-validation of training sets. The final models are evaluated by using test sets.Item Latent Dirichlet Allocation and Predatory Pricing Online Data(Department of Applied Economics and Statistics, University of Delaware, Newark, DE., 2021-02-24) Xu, Xiaotian; Ding, ShanshanIn this paper, we study Latent Dirichlet Allocation (LDA; Blei et al., 2012) for topic modeling of Amazon unfair pricing data during Covid-19. A topic model is designed to capture topics relating to words in text document or corpus. LDA is a generative probabilistic model with helping to collect topics from discrete data, like text & corpora. It is also known as a three-level hierarchical Bayesian model, where each item of the collection is modeled as a nite mixture over an underlying set of topics. For each topic, it is modeled as an in nite mixture on an underlying set of basic topic probabilities in turn. We conduct analy- sis of unfair pricing data by sellers from Amazon during the Covid-19 period using LDA. Speci cally, we perform topic modeling and generate topics under Amazon product description. Our goal is to capture information and topics on what kind of surgical masks and products are in unfair pricing during Covid- 19. Finally, we conclude that N95 is the most unfairly priced product under the topic modeling. By generating graphical illustrations with the Python pyL- DAvis package, we are able to summarize and provide more detailed information based on Predatory Pricing Online model.