top of page

“Information is the oil of the 21st century,

  and analytics is the combustion engine.”  Peter Sondergaard

Experiential Learning (Badge)

Picture1.png

Predicting Returns for PUMA

Tools: Python & Tableau  |  Sponsor: PUMA

A capstone project that I completed in my graduate school.

To predict whether a transaction will be returned (classification), we merged internal and external datasets, dealt with NA and outliers, balanced a training dataset, conducted one-hot, label, and target encoding, used tree-based algorithms, tuned hyperparameters, performed feature selection, evaluated model performance, created a dashboard and an interactive interface, and gave final conclusions and recommendations to the sponsor.

AI-Driven Finance Planning Platform

螢幕快照 2021-06-13 下午3.56.52.png

Tools: KNIME & Qualtrics & Wix.com  |  Sponsor: Dorval & Chorne Financial Advisors

An integrated experiential learning project that I completed in my graduate school.

To develop an AI-driven financial planning platform, we preprocessed text data, performed NLP, conducted LDA topic modeling, and used results to build predictive models. We found that the historical data was messy and time-consuming, and models were not precise. Thus, to avoid "garbage in, garbage out" and to help the sponsor achieve the project's final goal in the future, we proposed a short-term goal, designed a quality data collecting process, and built a website mockup.

Supervised ML

Supervised ML

110px-Python-logo-notext.svg.png

Predicting Bank Marketing Campaign

A classification project. 
EDA | Dummy & One-hot encoding | Oversampling | Feature scaling | 5 models (LR, KNN, DT, RF, GB) | Hyperparameters tuning | Classification evaluation metrics | Future scope

螢幕快照 2021-06-12 下午10.12.00.png

Predicting Genre of Spotify Songs

A multiclass classification project. 
EDA | Feature scaling | Oversampling & Undersampling | 5 models (Multiclass LR, KNN, DT, RF, XGB) | Hyperparameters tuning | Classification evaluation metrics | Future scope

螢幕快照 2021-06-12 下午10.12.00.png

Predicting Handwritten Digit

A multiclass logistic regression (softmax regression) project. 
Image pixel dataset | 10 Logistic regression models (one per digit) | Softmax function | Classification evaluation metrics

螢幕快照 2021-06-12 下午10.12.00.png

Predicting Boston House Price

A regularized regression project. 

Linear regression assumptions | Multiple linear regression (OLS) | Multicollinearity | Regularization | Ridge & Lasso regression | Regression evaluation metrics

The GitHub will be published soon, stay tuned!

Excel2_35735.png

Predicting Search Volume in Excel

A K-Nearest Neighbors algorithm in Excel for even nonprogrammers. 
Two distance metrics (Euclidean & Manhattan) | Dimensionality of the model (n) | Number of nearest neighbors (k) | Regression evaluation metric (RMSE)

螢幕快照 2021-06-12 下午10.12.00.png

Predicting Customer Churn in Telecom

My first classification project. 
EDA | Feature selection | Information values | Multicollinearity problem | Logistic regressions & Decision trees | Optimal probability cutoff | Classification evaluation metrics | Feature importance | Data-driven recommendations

Unsupervised ML

Unsupervised ML

螢幕快照 2021-06-12 下午10.12.00.png

Segmenting Mall Customer

A clustering project. 
EDA | Data standardization | K-means & DBScan | Optimal clusters | Optimal radius | Clusters & 3D scatter plots | Data-driven targeted business strategies

110px-Python-logo-notext.svg.png

Visualizing Dimensional Reduction

A t-SNE (non-linear dimensionality reduction algorithm) project. 
Image pixel dataset | Random samples | 784 dimensions to 2 dimensions | Scatter plot

Text Mining

Text Mining

螢幕快照 2021-06-12 下午10.12.00.png

The Big Bang Theory Episode Plot Descriptions

An NLP project. 
Corpus & Dictionary | Convert letters to lower case | Remove white space & punctuation 

& stop words | Customized Stemmer & Stem Completion | Bag of Words | Correlation of frequent terms | Association of a given word | 2-gram analysis | Hierarchical clustering

137695101714555555.jpg

NLP - Text Detection and Correction System

Tools: MS SQL & C#

A capstone project that I completed in my undergraduate.

Like a woodpecker pecking an insect, to diagnose text errors, we utilized an existing corpus obtained from the government website, processed new input data, conducted n-gram analysis, and finally built a text detection and correction interactive system.

Statistical Inference

Statistical Inference

55974e6bd1ccee00fcb9e66eb809652d741cbbeb

Hypothesis Testing and Confidence Intervals

A review of statistical knowledge for data science. 
Hypothesis testing & Confidence intervals & their relationship | Other important statistical concepts (central limit theorem, sampling distribution, significance level, p-value)

螢幕快照 2021-06-12 下午10.12.00.png

Obesity in the USA

A hypothesis testing project. 

One-sample t-test | Two proportion z-test | F-test | Two-sample t-test | Assumption checking | Null hypothesis & Alternative hypothesis | Critical value & Test statistic | Significance level & P-value | Hypothesis test graphs

Data Warehousing

Data Warehousing

sticker-png-mysql-logo-organization-data

Data Warehousing and Business Intelligence

An introduction to concepts of data warehouses and business intelligence with examples from my past working experience.

Entity-Relationship Diagram | Key terminologies | Normalization | SQL statements | Window functions | Joins | Star & Snowflake schema | Dimensions & Fact tables | Slowly changing dimensions | Referential integrity actions | OLTP & OLAP

The post will be published soon, stay tuned!

Other Analytics

Other Analytics

Excel2_35735.png

Enterprise Analytics

An introduction to advanced Excel skills.
Data Analysis (Descriptive Statistics & Regression) | What-If Analysis (Data Table & Goal Seek) | Solver (Linear programming & Non-linear programming) | IF & SUMPRODUCT | Business scenario problems | Data-driven decisions | Different industries |

CRISP-DM_Process_Diagram.png

Leadership in Analytics & Risk Management Analytics

CRISP-DM | Qualitative risk assessment | Quantitative risk assessment | Risk treatment and response plan | Key risk indicators (KRIs)

The post will be published soon, stay tuned!

clipart488563.png
1490129331-rounded07_82197.png
linkedin_black_logo_icon_147114.png
github-logo_icon-icons.com_73546.png
2853977_primerica-logo-tableau-software-

© 2021 By Kuan-Pei (Yuki) Lai

bottom of page