In this project, a supervised machine learning model that predicts the corn yield for more than 100 counties in the Corn Belt area was trained using weather and soil data. More than 30 gigabytes of weather and soil data from years 1980 - 2022 were scrapped from several websites for the model training. Instead of a complex time series prediction, this model builds 12 separate random forests corresponding to 12 months to give monthly update of corn yield for each county. This test mean absolute percentage error can achieve 15% when using the full weather and soil data (April to next October) averaged on predictions of all counties.
In this project, several classification models were trained on a public credit card fraud dataset published in Kaggle. These models include linear models (Logistic Regression (LR) and Linear Discriminant Analyzer (LDA), and Naive Bayes (NB) model) and the non-linear models (SVM and Quadradic Discriminant Analyzer). Among these models, 100% prediction accuracy was achieved by a Support Vector Machine (SVM) with a "rbf" kernel, which indicates that this model is able to capture all the patterns to seperate the two classes, fraud and not fraud. And the worst performence was achieved by the NB model. The accuracy was only 92%, which shows that a linear model with independence assumption is insufficient to explain all the variances presented by this dataset.