My goal is to analyse how frequency of names found among Disney female characters changed over time in the US. Specifically, I want to see if the movie release had any impact on their popularity. For this purpose, I will use babynames dataset that is available on CRAN.

The idea for the exercise was inspired by Sean Kross’ blog post

short description of the dataset

from CRAN package description

The SSA baby names data comes from social security number (SSN) applications. SSA cards were first issued in 1936, but were only needed for people with an income. In 1986, the law changed effectively requiring all children to get an SSN at birth.

The dataset is quite simple, covering US baby name records from late 1800’s until 2014. It specifies whether a name is male or female, number of respective names in a given year and what proportion they constituted.

library(babynames)
baby <- babynames
baby$sex=as.factor(baby$sex)
summary(baby)
##       year      sex             name                 n          
##  Min.   :1880   F:1081683   Length:1825433     Min.   :    5.0  
##  1st Qu.:1949   M: 743750   Class :character   1st Qu.:    7.0  
##  Median :1982               Mode  :character   Median :   12.0  
##  Mean   :1973                                  Mean   :  184.7  
##  3rd Qu.:2001                                  3rd Qu.:   32.0  
##  Max.   :2014                                  Max.   :99680.0  
##       prop          
##  Min.   :2.260e-06  
##  1st Qu.:3.910e-06  
##  Median :7.390e-06  
##  Mean   :1.407e-04  
##  3rd Qu.:2.346e-05  
##  Max.   :8.155e-02

installing packages

library(dplyr)
library(ggplot2)

quick data pre-prep

I assign each name to a separate dataframe.

ariel <- baby %>%
  filter(name == "Ariel", sex == "F")

belle <- baby %>%
  filter(name == "Belle", sex == "F")

jasmine <- baby %>%
  filter(name == "Jasmine", sex == "F")

tiana <- baby %>%
  filter(name == "Tiana", sex == "F")

merida <- baby %>%
  filter(name == "Merida", sex == "F")

elsa <- baby %>%
  filter(name == "Elsa", sex == "F")

Next, I create variables specifying the release date of a movie with character’s name.

# The Little Mermaid
ariel_release = 1989

# Beauty and the Beast
belle_release = 1991

# Alladin
jasmine_release = 1992

# The Princess and the Frog
tiana_release = 2009

# Brave
merida_release = 2012

# Frozen
elsa_release = 2013

plots

Finally, I plot the number of names for a given year. The arrows indicate when the movie was released, so that it’s easier to compare before and after trend. Additionally, I show the number of names and their proportion for a year proceeding and following the movie release. The numbers (and graphs!) say it all :-)

Namely, the movie release indeed seems to have a positive impact on name popularity, at least short- , but often long-term. For example, Ariel, Belle showed much higher popularity that in pre-release days even a decade after the movie has been published.

Ariel

  ggplot(ariel, aes(x=year, y=n)) +
  geom_line(col="red") +
  xlab("") +
  ylab("") +
  ggtitle("Ariel") +
  geom_segment(aes(x = ariel_release, y = 0, xend = ariel_release, yend = 500), arrow = arrow(length = unit(0.1, "cm")))

Ariel

ariel %>%
  filter(year %in% c(ariel_release - 1,  ariel_release + 1))  %>%
  mutate(when = ifelse(year == ariel_release - 1, "1 yr before",
                      "1 yr after"))
## # A tibble: 2 × 6
##    year    sex  name     n        prop        when
##   <dbl> <fctr> <chr> <int>       <dbl>       <chr>
## 1  1988      F Ariel   910 0.000473441 1 yr before
## 2  1990      F Ariel  3604 0.001754975  1 yr after

Belle

Belle

## # A tibble: 2 × 6
##    year    sex  name     n         prop        when
##   <dbl> <fctr> <chr> <int>        <dbl>       <chr>
## 1  1990      F Belle    13 6.330374e-06 1 yr before
## 2  1992      F Belle    19 9.480574e-06  1 yr after

Jasmine

Jasmine

## # A tibble: 2 × 6
##    year    sex    name     n        prop        when
##   <dbl> <fctr>   <chr> <int>       <dbl>       <chr>
## 1  1991      F Jasmine 11523 0.005668207 1 yr before
## 2  1993      F Jasmine 12059 0.006118574  1 yr after

On the other hand, in case of Jasmine and Tiana, the positive effect of the Disney movie release on name’s popularity was rather short-term, although strong.

Tiana

Tiana

## # A tibble: 2 × 6
##    year    sex  name     n         prop        when
##   <dbl> <fctr> <chr> <int>        <dbl>       <chr>
## 1  2008      F Tiana   481 0.0002313058 1 yr before
## 2  2010      F Tiana   962 0.0004917941  1 yr after

Merida

Finally, it’s difficult to judge the long-term effect for names Merida and Elsa, as both movies have been released very recently, but at least a year following the release showed a remarkable jump in names’ popularity.

Merida

Here, the 1 year before and after comparison is not possible, as the movie was released in 2012 and no baby was called Merida in the US between 2002 and 2012. This still proves how powerful Disney movies can be!

tail(merida)
## # A tibble: 6 × 5
##    year    sex   name     n         prop
##   <dbl> <fctr>  <chr> <int>        <dbl>
## 1  1996      F Merida     8 4.174171e-06
## 2  2001      F Merida     5 2.526008e-06
## 3  2002      F Merida     5 2.533732e-06
## 4  2012      F Merida    19 9.831147e-06
## 5  2013      F Merida   109 5.681315e-05
## 6  2014      F Merida   101 5.210123e-05

Elsa

Elsa

## # A tibble: 2 × 6
##    year    sex  name     n         prop        when
##   <dbl> <fctr> <chr> <int>        <dbl>       <chr>
## 1  2012      F  Elsa   540 0.0002794116 1 yr before
## 2  2014      F  Elsa  1131 0.0005834306  1 yr after

All in all, it goes to show that Disney movies are an important part of our culture that has the power to influence our lives in surprising ways :)