Automated and Unmysterious Machine Learning in Cancer Detection

I get bored from doing two things: i) spot-checking + optimising parameters of my predictive models and ii) reading about how ‘black box’ machine learning (particularly deep learning) models are and how little we can do to better understand how they learn (or not learn, for example when they take a panda bear for a vulture!). In this post I’ll... [Read More]

Friendships among top R - Twitterers

Have you ever wondered whether the most active/popular R-twitterers are virtual friends? :) And by friends here I simply mean mutual followers on Twitter. In this post, I score and pick top 30 #rstats twitter users and analyse their Twitter friends’ network. You’ll see a lot of applications of rtweet and ggraph packages, as well as a very useful twist... [Read More]

Animated Plots As Part Of Exploratory Data Analysis

The internet seems to be booming with blog posts on animated graphs, whether it’s for more serious purposes or not so much. I didn’t think anything more of it than just a gimmick or a cool way of spicing up your conference talk. However, I’m a total convert now and in this post I want to show a real value... [Read More]

Cluster Validation In Unsupervised Machine Learning

In the previous post I showed several methods that can be used to determine the optimal number of clusters in your data - this often needs to be defined for the actual clustering algorithm to run. Once it’s run, however, there’s no guarantee that those clusters are stable and reliable. [Read More]

Determining Optimal Number Of Clusters In Your Data

Recently, I worked a bit with cluster analysis: the common method in unsupervised learning that uses datasets without labeled responses to draw inferences. I wanted to put my notes together and write it all down before I forget it, thus the blog post. For the start, I’ll tackle multiple approaches to how to determine the number of clusters in your... [Read More]