-
Getting started with Random Forests in R
This is a step-by-step intro to Random Forests using R, in particular using RStudio. RStudio is a powerful and open source integrated development environment (IDE) for R, available on Windows, Mac and Linux. It supports direct code excecution via a console, as well as an editor with code completion and other tools for plotting and…
-
Logistic Regression
Logistic regression is used for classification. Despite being called “regression”. It is useful when you are interested in predicting binary outcomes from a set of continuous predictor variables or features. Logistic regression is useful when the goal is to understand the role of the input variables in explaining the outcome. Keeping our probabilistic view of…
-
Discriminant analysis from a Bayesian point of view
In LDA and QDA it is assumed that the the class-conditional densities represent a sample from a multivariate Gaussian distribution, that makes the likelihood a normal distribution
-
Linear Models for classification
There are two different approaches to determining the conditional probabilities . One technique is to model them directly, for example by representing them as parametric models and then optimizing the parameters using a training set. Alternatively, we can adopt a Bayesian approach in which we model the class-conditional densities given by , together with the…
-
Hello blog!
So this blog was inspired by John Rauser’s keynote talk at the Strata + Hadoop world conference this year. John’s premise is that doing statistics is actually quite simple and anyone with access to a computer and a programming language can craft statistical tests and ask all the meaningful questions that constitute the core of…