Cluster Validation In Unsupervised Machine Learning

In the previous post I showed several methods that can be used to determine the optimal number of clusters in your data - this often needs to be defined for the actual clustering algorithm to run. Once it’s run, however, there’s no guarantee that those clusters are stable and reliable. [Read More]

Determining Optimal Number Of Clusters In Your Data

Recently, I worked a bit with cluster analysis: the common method in unsupervised learning that uses datasets without labeled responses to draw inferences. I wanted to put my notes together and write it all down before I forget it, thus the blog post. For the start, I’ll tackle multiple approaches to how to determine the number of clusters in your... [Read More]

Scraping Table With R Datasets

It’s a very quick post on how to get a list of datasets available from within R with their basic description (what package they can be found in, number of observations and variables). It always takes me some time to find the right dataset to showcase whatever process or method I’m working with, so this was really to make my... [Read More]

Harari Sentiment Analysis

So! Following my previous blog post where I scraped Amazon reviews of Yuval Harari’s Sapiens to create a wordcloud based on them, here I will compare results of sentiment analysis performed on Harari’s two books: Sapiens and Homo Deus. [Read More]

Amazon Reviews Wordcloud

In the last month I discovered two things that changed my life: audiobooks and Yuval Harari. The former completely transformed my daily commute, the latter changed the way I think about the surrounding world (with more appreciation for history and politics, to say the least). The cocktail of the two made my brain cells sing. [Read More]