UPDATE (March 25, 2017)

The project outline below was origially drafted in September, 2016. Much has changed since then. Mabye I’ll get around to updating the description at some point. Here are a few of the changes:
- the model
- new features
- feature reduction of highly correlated features
- stepwise feature selection according to AIC
- set of scripts to streamline model testing and evaluation
- monte carlo simulations to explore most profitable betting strategies (developed by a friend and my brother)

Real World Results

Below are real world predictions made prior to each event. You’ll notice the performance between the model and Vegas is pretty similar below. A few things to remember about the model and the results listed below:
- the model doesn’t make prediction where one of the fighters is making a UFC debut
- these are all model predictions at the minimum confidence threshold of 50%
- in a betting situation, these predictions are filtered to a higher confidence threshold, as well as by other means.

Date Bets Moneyline Probability Model.Win Vegas.Win Thresh
2017-03-18 makwan amirkhani -120 0.6285 0 0 60
2017-03-18 brad pickett -138 0.5749 0 0 55
2017-03-18 scott askham -142 0.5857 0 0 55
2017-03-18 daniel omielanczuk 153 0.6032 0 1 60
2017-03-18 darren stewart -176 0.8005 0 0 75
2017-03-18 gunnar nelson -334 0.6366 1 1 60
2017-03-18 corey anderson 125 0.5971 0 1 55
2017-03-18 joe duffy -591 0.7547 1 1 75
2017-03-18 vicente luque -128 0.5163 0 0 50
2017-03-18 marc diakiese -202 0.6297 1 1 60
2017-03-11 beneil dariush 135 0.5341 0 1 50
2017-03-11 francisco trinaldo 160 0.5582 0 1 55
2017-03-11 ray borg -124 0.5128 1 1 50
2017-03-11 mauricio rua -150 0.7802 1 1 75
2017-03-11 michel prazeres -241 0.7483 1 1 70
2017-03-11 rani yahya -211 0.7123 0 0 70
2017-03-11 rony jason 105 0.5065 0 1 50
2017-03-11 tim means -181 0.6000 0 0 60
2017-03-11 kelvin gastelum -340 0.7107 1 1 70
2017-03-04 alistair overeem -131 0.7104 1 1 70
2017-03-04 mirsad bektic -675 0.6382 0 0 60
2017-03-04 luke sanders -135 0.6983 0 0 65
2017-03-04 david teymur 305 0.5320 1 0 50
2017-03-04 marcin tybura -156 0.5285 1 1 50
2017-03-04 paul craig 115 0.5811 0 1 55
2017-03-04 rashad evans -220 0.5955 0 0 55
2017-03-04 stephen thompson -134 0.6237 0 0 60
2017-02-19 elias theodorou 106 0.5708 1 0 55
2017-02-19 gerald meerschaert -279 0.6066 1 1 60
2017-02-19 johny hendricks 120 0.5623 1 0 55
2017-02-19 paul felder -346 0.7766 1 1 75
2017-02-19 santiago ponzinibbio -330 0.6707 1 1 65
2017-02-19 thiago santos -182 0.5703 1 1 55
2017-02-19 travis browne 101 0.5026 0 1 50
2017-02-11 randy brown -143 0.7754 0 0 75
2017-02-11 derek brunson -120 0.6034 0 0 60
2017-02-11 glover teixeira -199 0.5439 1 1 50
2017-02-11 jacare souza -555 0.8839 1 1 75
2017-02-11 dustin poirier -431 0.6404 1 1 60
2017-02-11 islam makhachev -327 0.5128 1 1 50
2017-02-11 rick glenn -176 0.5366 1 1 50
2017-02-11 ryan laflare -296 0.6748 1 1 65
2017-02-11 wilson reis -591 0.8891 1 1 75
2017-02-04 abel trujillo -110 0.5531 0 0 55
2017-02-04 curtis blaydes -300 0.5549 1 1 55
2017-02-04 alex morono -105 0.5609 0 0 55
2017-02-04 chris gruetzemacher 230 0.5427 0 1 50
2017-02-04 khalil rountree jr -165 0.6463 1 1 60
2017-02-04 dennis bermudez -200 0.7212 0 0 70
2017-01-28 raphael assuncao -155 0.5353 1 1 50
2017-01-28 francis ngannou -375 0.7663 1 1 75
2017-01-28 donald cerrone -155 0.7404 0 0 70
2017-01-28 alessio di chirico -110 0.7764 0 0 75
2017-01-28 jc cottrell -145 0.5350 0 0 50
2017-01-28 jason knight -150 0.6837 1 1 65
2017-01-28 sam alvey -155 0.5182 1 1 50
2017-01-15 aleksei oleinik -145 0.5814 1 1 55
2017-01-15 court mcgee -140 0.5932 0 0 55
2017-01-15 joachim christensen -225 0.6326 1 1 60
2017-01-15 dmitrii smoliakov -110 0.6602 0 0 65
2017-01-15 frankie saenz -165 0.5557 0 0 55
2017-01-15 marcin held 100 0.5154 0 1 50
2017-01-15 sergio pettis -150 0.7080 1 1 70
2017-01-15 tony martin -175 0.5132 1 1 50
2017-01-15 walt harris -145 0.6143 1 1 60
2017-01-15 yair rodriguez -445 0.7820 1 1 75
2016-12-30 antonio carlos junior -140 0.5263 1 1 50
2016-12-30 cody garbrandt 160 0.5184 1 0 50
2016-12-30 tarec saffiedine 120 0.5112 0 1 50
2016-12-30 neil magny -165 0.5515 1 1 55
2016-12-30 louis smolka 115 0.6123 0 1 60
2016-12-30 alex garcia -190 0.5155 1 1 50
2016-12-30 tj dillashaw -205 0.5100 1 1 50
2016-12-17 alex morono -115 0.6271 1 1 60
2016-12-17 bojan velickovic -155 0.7095 0 0 70
2016-12-17 colby covington -360 0.5998 1 1 55
2016-12-17 cole miller -110 0.5299 0 0 50
2016-12-17 eddie wineland -225 0.5080 1 1 50
2016-12-17 hector sandoval -130 0.6126 1 1 60
2016-12-17 scott holtzman 150 0.5767 0 1 55
2016-12-17 mike perry -135 0.5634 0 0 55
2016-12-17 mickey gall -120 0.5004 1 1 50
2016-12-17 urijah faber -450 0.6846 1 1 65
2016-12-10 max holloway -195 0.5800 1 1 55
2016-12-10 cub swanson 188 0.5041 1 0 50
2016-12-10 donald cerrone -275 0.7071 1 1 70
2016-12-10 lando vannata -170 0.5478 1 1 50
2016-12-10 kelvin gastelum 110 0.7367 1 0 70
2016-12-10 mitch gagnon -156 0.5630 0 0 55
2016-12-10 nikita krylov 100 0.6733 0 1 65
2016-12-10 olivier aubin mercier -165 0.5002 1 1 50
2016-12-10 rustam khabilov -210 0.7076 1 1 70
2016-12-10 zach makovsky -160 0.5500 0 0 55
2016-12-09 andrew sanchez -205 0.7311 1 1 70
2016-12-09 francis ngannou -550 0.5372 1 1 50
2016-12-09 corey anderson -400 0.7424 1 1 70
2016-12-09 derrick lewis -205 0.7983 1 1 75
2016-12-09 marc diakiese -325 0.6540 1 1 65
2016-12-03 anthony smith -115 0.6544 1 1 65
2016-12-03 brandon moreno -125 0.7252 1 1 70
2016-12-03 brendan oreilly 100 0.5464 0 1 50
2016-12-03 ryan hall -110 0.5732 1 1 55
2016-12-03 ion cutelaba -215 0.6528 0 0 65
2016-12-03 jorge masvidal -250 0.5396 1 1 50
2016-12-03 joseph benavidez -200 0.5385 1 1 50
2016-12-03 devin clark -105 0.5631 1 1 55
2016-11-26 jake matthews -300 0.7287 0 0 70
2016-11-26 omari akhmedov -180 0.5074 1 1 50
2016-11-26 damien brown 115 0.5549 1 0 55
2016-11-26 richard walsh 112 0.6033 0 1 60
2016-11-26 ben nguyen 110 0.5989 1 0 55
2016-11-26 marlon vera 125 0.5636 1 0 55
2016-11-26 derek brunson -135 0.5036 0 0 50
2016-11-26 chris camozzi -245 0.7258 0 0 70
2016-11-26 jason knight 131 0.6187 1 0 60
2016-11-19 gegard mousasi -550 0.7373 1 1 70
2016-11-19 stevie ray -105 0.5643 1 1 55
2016-11-19 teruto ishihara -275 0.6556 0 0 65
2016-11-19 kyoji horiguchi -199 0.7209 1 1 70
2016-11-19 kevin lee -102 0.7378 1 1 70
2016-11-19 zak cummings -170 0.5942 1 1 55
2016-11-19 ryan bader -370 0.7399 1 1 70
2016-11-19 albert morales 235 0.5279 0 1 50
2016-11-19 krzysztof jotko 140 0.5699 1 0 55
2016-11-19 zak ottow 142 0.5038 0 1 50
2016-11-19 manvel gamburyan 140 0.6020 0 1 60
2016-11-19 luis henrique -275 0.6747 1 1 65
2016-11-19 kamaru usman -205 0.7164 1 1 70
2016-11-19 jack hermansson -210 0.5305 0 0 50
2016-11-19 justin scoggins -168 0.7722 0 0 75
2016-11-12 conor mcgregor -137 0.5500 1 1 55
2016-11-12 stephen thompson -191 0.5142 0 0 50
2016-11-12 chris weidman -190 0.7122 0 0 70
2016-11-12 frankie edgar -340 0.5929 1 1 55
2016-11-12 rafael natal -165 0.6559 0 0 65
2016-11-12 vicente luque -105 0.6086 1 1 60
2016-11-12 jim miller 130 0.7162 1 0 70
2016-11-12 khabib nurmagomedov -305 0.6494 1 1 60
2016-11-05 rafael dos anjos -145 0.5486 0 0 50
2016-11-05 charles oliveira -105 0.5347 0 0 50
2016-11-05 beneil dariush -115 0.6033 1 1 60
2016-11-05 marco beltran -125 0.5642 0 0 55
2016-11-05 erick montano -105 0.6659 0 0 65
2016-11-05 douglas silva de andrade -125 0.5554 1 1 55
2016-11-05 alex nicholson 175 0.5269 0 1 50
2016-11-05 enrique barzola -549 0.6046 1 1 60
2016-11-05 felipe arantes 180 0.5206 0 1 50
2016-10-08 Michael Bisping -225 0.8037 1 1 75
2016-10-08 Gegard Mousasi -300 0.9063 1 1 75
2016-10-08 Ovince Saint Preux -155 0.7126 0 0 70
2016-10-08 Stefan Struve -185 0.7629 1 1 75
2016-10-08 Mirsad Bektic -700 0.5751 1 1 55
2016-10-08 Iuri Alcantara -165 0.7111 1 1 70
2016-10-08 Damian Stasiak 145 0.6871 1 0 65
2016-10-08 Albert Tumenov -260 0.5388 0 0 50
2016-10-08 Mike Perry 120 0.6247 1 0 60
2016-10-08 Leonardo Santos 175 0.5480 1 0 50
2016-10-01 John Lineker 105 0.7123 1 0 70
2016-10-01 Will Brooks -270 0.5858 0 0 55
2016-10-01 Andre Fili 150 0.5666 1 0 55
2016-10-01 Shamil Abdurakhimov -120 0.5583 1 1 55
2016-10-01 Elizeu Zaleski dos Santos 130 0.5261 1 0 50
2016-10-01 Nate Marquardt 180 0.5480 1 0 50
2016-10-01 Ion Cutelaba -165 0.6151 1 1 60
2016-10-01 Curtis Blaydes -210 0.5855 1 1 55
2016-09-24 Renan Barao -500 0.7085 1 1 70
2016-09-24 Antonio Silva 350 0.5854 0 1 55
2016-09-24 Fancisco Trinaldo -130 0.7309 1 1 70
2016-09-24 Thiago Santos -620 0.6075 0 0 60
2016-09-24 Mike de la Torre 115 0.5159 0 1 50
2016-09-24 Gilbert Burns -175 0.5683 0 0 55
2016-09-24 Rani Yahya -140 0.6103 1 1 60
2016-09-24 Jussier Formiga -200 0.5980 1 1 55
2016-09-24 Stevie Ray -160 0.6967 0 0 65
2016-09-24 Vicente Luque -500 0.6840 1 1 65
2016-09-17 Dustin Poirier -170 0.6995 0 0 65
2016-09-17 Derek Brunson -190 0.5975 1 1 55
2016-09-17 Roan Carneiro 105 0.5722 1 0 55
2016-09-17 Islam Makhachev -135 0.5811 1 1 55
2016-09-17 Chas Skelly -175 0.5081 1 1 50
2016-09-17 Gabriel Benitez 105 0.5951 1 0 55
2016-09-17 Belal Muhammad -450 0.5708 1 1 55
2016-09-17 Antonio Carlos Junior -255 0.6906 1 1 65
2016-09-17 Jose Quinonez 140 0.5547 1 0 55
2016-09-17 Randy Brown -450 0.6144 1 1 60
2016-09-10 Jimmie Rivera -170 0.5714 1 1 55
2016-09-10 Stipe Miocic -110 0.6259 1 1 60
2016-09-10 Fabricio Werdum -225 0.8458 1 1 75
2016-09-10 Brad Tavares -200 0.6333 1 1 60
2016-09-10 Sean Spencer 105 0.5688 0 1 55
2016-09-03 Andrei Arlovski 120 0.6398 0 1 60
2016-09-03 Alexander Gustaffson -550 0.7330 1 1 70
2016-09-03 Ryan Bader -170 0.7937 1 1 75
2016-09-03 Nick Hein -340 0.7009 1 1 70
2016-09-03 Nick Dalby -190 0.5350 0 0 50
2016-09-03 Taylor Lapilus -139 0.7598 1 1 75
2016-09-03 Rustam Khabilov -560 0.7424 1 1 70

## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths

***

Introduction

Ultimate Fighting Championship (UFC) is the world’s premier professional mixed martial arts organization. Fighters engage in one-on-one matches consisting of 3-5 rounds of 5 minutes each. Fighters are allowed to strike with punches, kicks, knees, elbows, or employ submission moves such as chokes or joint locks. Fighting occurs standing up, on the ground, or in a clinch position. Much like other professional sports, there are a variety of structured data gathered on individual fighters and on each fight match up. This project gathers much of that data, learns from it, and then makes predictions about future fight matchups. Specifically, this project seeks to answer the question: Given a specific mathcup, what is each fighter’s probability of winning?


Wrangling

The dataset ultimately developed by and for this project is not only an aggregation of data scraped from multiple sources, but is also comprised of newly generated data derived from the scraped data. Namely, the derivitive data refers to historic career statistics for each fighter at the time of each fight.


Aggregating The Data

This project intially gathered data from four sources:

  • UFC Unofficial API: list of current UFC fighters
  • Wikipedia: list of UFC events and event information
  • FightMetric: detailed fight-by-fight statistics and fighter statistics for every UFC fight
  • FightMatrix: proprietary current and historical fighter ranking system

However, in the end the only sources of data that were used in the final dataset were FightMetric and FightMatrix. The process of scraping data generally followed the pattern below using the Rvest package.

1. Collect and store relevant URLs in a list.

# Collect individual fighter urls.
for (letter in letters) {
  fightMetric_fighter_list_url <- paste("http://www.fightmetric.com/statistics/fighters?char=",
                                        letter, "&page=all", sep = "")
  
  fightMetric_fighter_list_page <- read_html(fightMetric_fighter_list_url)
  
  # Get inidividual FightMetric Fighter links for letter of alphabet from Fighter List page.
  fighter_links_per_letter <- fightMetric_fighter_list_page %>%
    html_nodes(".b-statistics__table-col:nth-child(1) .b-link_style_black") %>%
    html_attr("href")
  
  fightMetric_fighter_links <- c(fightMetric_fighter_links, fighter_links_per_letter)
}

2. Make copies of HTML from the URL list and store in a list.

#### Make copies of each FightMetric fighter page HTML to later scrape data we need. ####
fightMetric_fighter_html <- list()

for (i in 1:length(fightMetric_fighter_links)) {
  fightMetric_fighter_html[[i]] <- read_html(fightMetric_fighter_links[[i]])
}

3. Parse the HTML and put what we want in a dataframe.

#### Populate REVISED FIGHTER DATA ####
for (i in 1:length(fightMetric_fighter_html)) {
  
  # Get fighter name
  name <- fightMetric_fighter_html[[i]] %>%
    html_nodes(".b-content__title-highlight") %>%
    html_text(trim = TRUE)
  
  fightMetric_fighters_df[i, 1] <- name[[1]]
  
  # Get fighter pro record
  record <- fightMetric_fighter_html[[i]] %>%
    html_nodes(".b-content__title-record") %>%
    html_text(trim = TRUE) %>%
    strsplit(" ")
  
  record <- record[[1]][2] %>%
    strsplit("-")
  
  fightMetric_fighters_df[i, 2] <- record[[1]][1]
  
  fightMetric_fighters_df[i, 3] <- record[[1]][2]
  
  fightMetric_fighters_df[i, 4] <- record[[1]][3]
  
  # Get fighter stats
  stats <- fightMetric_fighter_html[[i]] %>%
    html_nodes(".b-list__box-list-item_type_block") %>%
    html_text(trim = TRUE) %>%
    gsub("^.*:(\\n *)*", "", .)
  
  stats <- stats[-10]
  
  for (s in 1:length(stats)) {
    if (s == "") {
      fightMetric_fighters_df[i, (4+s)] <- NA
    } else {
      fightMetric_fighters_df[i, (4+s)] <- stats[s]
    }
  }
}

This process was adapted and repeated to pull data for each of the three primary dataframes:

  • ufc_fight_details_df: fight-by-fight statistics
  • fightMetric_fighters_df: individual fighter statistics
  • figher_ranking_df: current and historical fighter rankings by weightclass

Cleaning The Data

Much of the data that was scraped had to be cleaned to some extent. At the very least values were converted from character to numeric, and in other cases fixes such as removing ‘%’ signs, converting a fighter’s height into number of inches, converting Dates of birth into machine readable Date format, etc… were necessary. Here are a few examples:

Removing ‘%’ sign.

# Remove % sign in several of the columns and convert to numeric.
for (i in c(11,13,15,16)) {
  fightMetric_fighters_df[,i] <- fightMetric_fighters_df[,i] %>%
    gsub("%", "", .) %>%
    as.numeric()
  
  fightMetric_fighters_df[,i] <- fightMetric_fighters_df[,i]/100
}

Converting height into number of inches

# Convert Height to numeric inches.
fightMetric_fighters_df$Height <- fightMetric_fighters_df$Height %>%
  gsub("'|\"", "", .) %>%
  strsplit(split = " ")

for (h in 1:length(fightMetric_fighters_df$Height)) {
  if (fightMetric_fighters_df[h, 5][[1]] == "--") {
    fightMetric_fighters_df[h, 5][[1]] <- NA
  } else {
    fightMetric_fighters_df[h, 5][[1]] <- (as.numeric(fightMetric_fighters_df[h, 5][[1]][1])*12) +
                                     as.numeric(fightMetric_fighters_df[h, 5][[1]][2])
  }
}

fightMetric_fighters_df$Height <- unlist(fightMetric_fighters_df$Height)

There were also instances of UFC fighters with the same name, which was particulary troublesome when merging this fighter data with other dataframes on the fighter’s name. It gave the appearance of a dataset or a merge that was entirely accurate, but for a few inexplicable errors. So after further digging into the datasets and crosschecking with the original webpages, the issue was handled by identifying these fighters by their name in addition to a physical attribute, and then renaming one of them. Here’s an example.

# Handle instances of same name in FIGHTMETRIC_FIGHTERS_DF
fightMetric_fighters_df[fightMetric_fighters_df$Name=="Dong Hyun Kim" & fightMetric_fighters_df$Reach==70, 1] <- "Dong Hyun Kim (Maestro)"
fightMetric_fighters_df[fightMetric_fighters_df$Name=="Michael McDonald" & fightMetric_fighters_df$Weight==205, 1] <- "Michael McDonald (The Black Sniper)"
fightMetric_fighters_df[fightMetric_fighters_df$Name=="Tony Johnson" & fightMetric_fighters_df$Weight==185, 1] <- "Tony Johnson (185lbs.)"

Shaping The Data

After all the data had been collected and stored in dataframes, there was some reshaping required in order to end up with a single dataframe suitable to subset from and present to a logistic regression or random forest function for modeling. At this point there were three dataframes - fight_details, fightMetric_fighters, and fighter_ranking - and the desired end state was to have a single fight_details that contained not only detailed fighter stats for each fight, but also each fighter’s attributes, and each fighter’s ranking. Furthermore, fight_details was currently in the form of each row representing one fighter’s stats for one fight, which meant there were two rows for each fight - one for each fighter’s stats. So, four things needed to happen…

1. Reshape fight_details to contain both fighter’s statistics for a single fight in a single row.

#### Reshape fight_details_df to accomodate logistical regression or random forest ####
colnames(ufc_fight_details_df)[2:3] <- c("Fighter1", "Fighter2")


# Grab even number rows from fight_details to later bind with odd number rows.
fighter2_df <- ufc_fight_details_df %>%
  slice(seq(2,nrow(ufc_fight_details_df),2))

# Rename columns in fighter2_df and original fight_details_df.
colnames(fighter2_df)[17:160] <- paste("F2", colnames(fighter2_df)[17:160], sep = "")
colnames(ufc_fight_details_df)[17:160] <- paste("F1", colnames(ufc_fight_details_df)[17:160], sep = "")

# Delete even number rows from fight_details.
ufc_fight_details_df <- ufc_fight_details_df[-seq(2,nrow(ufc_fight_details_df),2),]

# Combine columns from fight_details and fighter_2, so each row holds each fighter's data for
# a given fight.
ufc_fight_details_df <- bind_cols(ufc_fight_details_df, fighter2_df[17:160])

# Delete fighter_2 data frame.
rm(fighter2_df)

# Then, REMOVE rows with NA for FIGHTER.
na_row_index <- c()
for (n in 1:nrow(ufc_fight_details_df)) {
  if (is.na(ufc_fight_details_df[n,2]) | is.na(ufc_fight_details_df[n,3])) {
    na_row_index <- c(na_row_index, n)
  }
}

ufc_fight_details_df <- ufc_fight_details_df[-c(na_row_index),]
###################################

2. Include a new variable to indicate the winner - simply a 0 or 1 variable indicating if Fighter 1 wins or not.

# Mutate FIGHT_DETAILS to give logical form of result (Fighter1 or Fighter2) ready for logistic
# regression or random forest.
ufc_fight_details_df <- ufc_fight_details_df %>%
  mutate(F1Wins = as.character(Result) == as.character(Fighter1))

for (i in 1:nrow(ufc_fight_details_df)) {
  if (is.na(ufc_fight_details_df[i,305])) {
    ufc_fight_details_df[i,305] <- NA
  } else if (ufc_fight_details_df[i,305] == TRUE) {
    ufc_fight_details_df[i,305] <- 1
  } else {
    ufc_fight_details_df[i,305] <- 0
  }
}

3. Merge fightMetric_fighters with fight_details on each fighter’s name.

##### Merge FIGHTMETRIC_FIGHTERS_DF with FIGHT_DETAILS_DF ####

#### Merge FIGHTERS_DF to FIGHTER1 in FIGHT_DETAILS ####
fightMetric_fighters_df <- fightMetric_fighters_df %>%
  rename(Fighter1=Name)

ufc_fight_details_df <- ufc_fight_details_df %>%
  left_join(fightMetric_fighters_df, by = "Fighter1")

left_copy_fight_details <- ufc_fight_details_df[,1:306]
right_copy_fighter1_details <- ufc_fight_details_df[,307:323]

left_copy_double_rows <- as.numeric(rownames(left_copy_fight_details[duplicated(left_copy_fight_details),]))

left_copy_fight_details <- left_copy_fight_details[-c(left_copy_double_rows),]
right_copy_fighter1_details <- right_copy_fighter1_details[-c(left_copy_double_rows),]

ufc_fight_details_df <- left_copy_fight_details
########################

#### Merge FIGHTERS_DF to FIGHTER2 in FIGHT_DETAILS ####
fightMetric_fighters_df <- fightMetric_fighters_df %>%
  rename(Fighter2=Fighter1)

ufc_fight_details_df <- ufc_fight_details_df %>%
  left_join(fightMetric_fighters_df, by = "Fighter2")

left_copy_fight_details <- ufc_fight_details_df[,1:306]
right_copy_fighter2_details <- ufc_fight_details_df[,307:323]

left_copy_double_rows <- as.numeric(rownames(left_copy_fight_details[duplicated(left_copy_fight_details),]))

left_copy_fight_details <- left_copy_fight_details[-c(left_copy_double_rows),]
right_copy_fighter2_details <- right_copy_fighter2_details[-c(left_copy_double_rows),]

ufc_fight_details_df <- left_copy_fight_details %>%
  bind_cols(right_copy_fighter1_details) %>%
  bind_cols(right_copy_fighter2_details)

colnames(ufc_fight_details_df)[307:323] <- paste("F1", colnames(fightMetric_fighters_df)[2:18], sep = "")
colnames(ufc_fight_details_df)[324:340] <- paste("F2", colnames(fightMetric_fighters_df)[2:18], sep = "")

fightMetric_fighters_df <- fightMetric_fighters_df %>%
  rename(Name=Fighter2)
###################

4. Merge fighter_ranking with fight_details on each fighter’s name.

################### Assign FIGHTER_RANKING to fighters in NEW_FIGHTS #################

# Reset clean unique FightNumbers 
new_fights_df$FightNumber <- c(1:nrow(new_fights_df))

new_fights_df <- new_fights_df %>%
  mutate(F1Rank = 251, F2Rank = 251)

F1RankCol <- grep("F1Rank", colnames(new_fights_df))
F2RankCol <- grep("F2Rank", colnames(new_fights_df))

quarterly_date <- as.Date("2006-01-01", "%Y-%m-%d")
for (q in 1:(length(fighter_ranking_df)-9)) {
  temp_fighter_ranking <- fighter_ranking_df %>%
    select(c(q,353))
  
  quarterly_fight_details <- new_fights_df %>%
    filter(quarter(Date, with_year = TRUE) == quarter(quarterly_date, with_year = TRUE)) %>%
    select(FightNumber, Date, Fighter1, Fighter2, F1Rank, F2Rank)
  
  for (F1 in 1:nrow(quarterly_fight_details)) {
    if (tolower(quarterly_fight_details[F1, 3]) %in% tolower(na.omit(temp_fighter_ranking[,1]))) {
      quarterly_fight_details[F1,5] <- temp_fighter_ranking[tolower(na.omit(temp_fighter_ranking[,1])) == tolower(quarterly_fight_details[F1,3]), 2][1]
    }
    if (tolower(quarterly_fight_details[F1, 4]) %in% tolower(na.omit(temp_fighter_ranking[,1]))) {
      quarterly_fight_details[F1,6] <- temp_fighter_ranking[tolower(na.omit(temp_fighter_ranking[,1])) == tolower(quarterly_fight_details[F1,4]), 2][1]
    }
  }
  
  for (i in 1:nrow(quarterly_fight_details)) {
    new_fights_df[new_fights_df$FightNumber == quarterly_fight_details[i,1], c(F1RankCol, F2RankCol)] <- quarterly_fight_details[i, 5:6]
  }
  
  quarterly_date <- quarterly_date + months(3)
  if (quarter(quarterly_date, with_year = TRUE) > quarter(today(), with_year = TRUE)) {
    quarterly_date <- as.Date("2006-01-01", "%Y-%m-%d")
  }
}

new_fights_df$FightNumber <- c(1:nrow(new_fights_df))

new_fights_df$F1Rank <- as.numeric(new_fights_df$F1Rank)
new_fights_df$F2Rank <- as.numeric(new_fights_df$F2Rank)

#######################################################################################

Feature Engineering

As mentioned earlier, the dataset for this project is a combination of data scraped from several sources, as well as data derived (or newly created) from the scraped data. Specifically, the derived data includes career statistics for each fighter at the time of each fight. This data was not available anywhere online (to my knowledge), but was critical in developing an accurate description of each fighter at the time of each fight in which to build a model that predicts future outcomes. So, the basic approach to generating this new data was to sort through the entire fight dataset (ufc_fight_details_df) one fighter at a time in chronologically tally up their career stats. Below is an example of some of the code, which can almost certainly be refactored into something more economical, but for the time-being is made up of about 200 lines of nested for loops and if else statements.

4. Example block of code to generate derivitive data.

# Add columns to be populated to FIGHT_DETAILS
ufc_fight_details_df <- ufc_fight_details_df %>%
  mutate(F1CarFightTime = NA, F2CarFightTime = NA, F1UFCWins = NA, F2UFCWins = NA, F1UFCLosses = NA,
         F2UFCLosses = NA, F1UFCNC = NA, F2UFCNC = NA, F1CarSigStr = NA, F2CarSigStr = NA,
         F1CarSigStrAtt = NA, F2CarSigStrAtt = NA, F1CarSigStrAbs = NA, F2CarSigStrAbs = NA,
         F1CarTD = NA, F2CarTD = NA, F1CarTDAtt = NA, F2CarTDAtt = NA, F1CarTDAbs = NA, F2CarTDAbs = NA,
         F1CarOppSigStrAtt = NA, F2CarOppSigStrAtt = NA, F1CarOppTDAtt = NA, F2CarOppTDAtt = NA)

#### Begin loop to generate additional fighter data ####
for (fighter in unique_fighters$.) {
  fighter_career_data <- ufc_fight_details_df %>%
    filter((Fighter1 == as.character(fighter)) | (Fighter2 == as.character(fighter))) %>%
    arrange(Date)
  
  for (i in 1:nrow(fighter_career_data)) {
    
  #### Adding up total career time in ring for each fighter going into each fight - Fighter1 spot. ####
    if (fighter == fighter_career_data[i,2]) {
      
      if (i == 1) {
        # F1CarFightTime when F1
        fighter_career_data[i, 342] <- 0
        
        # F1CarSigStr when F1
        fighter_career_data[i, 350] <- 0
        
        # F1CarSigStrAtt when F1
        fighter_career_data[i, 352] <- 0
        
        # F1CarSigStrAbs when F1
        fighter_career_data[i, 354] <- 0
        
        # F1CarTD when F1
        fighter_career_data[i, 356] <- 0
        
        # F1CarTDAtt when F1
        fighter_career_data[i, 358] <- 0
        
        # F1CarTDAbs when F1
        fighter_career_data[i, 360] <- 0
        
        # F1CarOppSigStrAtt when F1
        fighter_career_data[i, 362] <- 0
        
        # F1CarOppTDAtt when F1
        fighter_career_data[i, 364] <- 0
      } else {
        # F1 was F1 last fight
        if (fighter == fighter_career_data[(i-1),2]) {
         
          # F1CarFightTime when F1 this fight was F1 last fight
          fighter_career_data[i, 342] <- fighter_career_data[(i-1), 342] + fighter_career_data[(i-1), 341]
          
          # Get fight data from last fight when F1 this fight was F1 last fight.
          temp_fight_data <- c(fighter_career_data[(i-1), 18], fighter_career_data[(i-1), 19],
                               fighter_career_data[(i-1), 162], fighter_career_data[(i-1), 23],
                               fighter_career_data[(i-1), 24], fighter_career_data[(i-1), 167],
                               fighter_career_data[(i-1), 163], fighter_career_data[(i-1), 168])
          
          # F1CarSigStr through F1CarOppTDAtt when F1 this fight was F1 last fight
          for (n in 1:8) {
            if (is.na(temp_fight_data[n])) {
              fighter_career_data[i, ((174+n)*2)] <- 0
            } else {
              fighter_career_data[i, ((174+n)*2)] <- fighter_career_data[(i-1), ((174+n)*2)] + temp_fight_data[n]
            }
          }
        }

Some widdling down of the features occured in going from the 382 features of ufc_fight_details_df to the subset of 48 features ultimately used in the logisitic regression model. This process was part logic, part domain expertise. Perhaps more on the logic side were features of physical description, like age, height, weight, reach, weight class, and stance. Career statistics features are the product of both logic and domain expertise. You don’t need to know anything about Mixed Martial Arts or UFC to think it’s a good idea to consider each fighter’s career statistics when making a prediction, but, for example, data for round-by-round statistics were not included because not all fights have the same number of rounds, and there would have been a considerable amount of additional data wrangling to present this data to the model in a useful way. The decision to include each fighter’s rank might also seem natural, but on the other hand an excellent data scientist without as much domain expertise might think that statistical descriptions of each fighter’s performance should be enough, or even superior than a somewhat subjective or circumstantial indicator like rank. But the ranking variable takes into consideration factors such as: the quality of a fighter’s past opponenents, and the chronology of past performance (e.g. a fighter with a long successful career that is now fighting past his prime).


Prediction Methods

After collecting and wrangling the data, we have a dataset of over 5,000 MMA fights ready to train a statistical model to make a predictions. Given we’re trying to predict a binary outcome - Win or Lose - for each fighter in a given fight, there are a few methods that would be suitable. For this project I tested two different methods - Logistic Regression and Random Forest, and ended up going with Logistic Regression because it performed better on test data.


Logistic Regression


Testing

# Variable for the date ten years ago today.
ten_years_ago <- ymd(today()) - years(10)
# This gives us the subset of fights (rows) we'll use for the model.
row_subset_fights <- ufc_fight_details_df %>%
  filter(Date > ten_years_ago) %>%
  filter(Gender == "Men") %>%
  filter(WeightClass != "Strawweight") %>%
  filter(WeightClass != "Atomweight") %>%
  filter(WeightClass != "Catch Weight") %>%
  filter(WeightClass != "Open Weight") %>%
  filter(WeightClass != "Other") %>%
  filter(WeightClass != "Super Heavyweight") %>%
  filter((F1UFCWins>0 | F1UFCLosses>0 | F1UFCNC>0) & (F2UFCWins>0 | F2UFCLosses>0 | F2UFCNC>0)) %>%
  arrange(Date)
# Subset of columns to be used in the model subset.
model_dataset_columns <- read.csv(paste(getwd(), "/FreshModelColumns.csv", sep = ""))
model_dataset_columns <- as.character(model_dataset_columns$ColumnNames)
# Get subset fight_details dataset - all rows; only columns from selected_features
fresh_model_dataset <- row_subset_fights %>%
  subset(select = model_dataset_columns)
# Factor Weightclass.
fresh_model_dataset$WeightClass <- as.factor(fresh_model_dataset$WeightClass)
# Factor F1Stance
fresh_model_dataset$F1Stance <- fresh_model_dataset$F1Stance %>%
  as.factor()
# Factor F2Stance
fresh_model_dataset$F2Stance <- fresh_model_dataset$F2Stance %>%
  as.factor()
##########!!!!!!!!! SUBSETTING !!!!!!!!!!###########
# Split fight_details_df into training and testing sets.
set.seed(109) 
split_ufc_data <- fresh_model_dataset$F1Wins %>%
  sample.split(SplitRatio = 0.70)
train_ufc_subset <- fresh_model_dataset %>%
  subset(split_ufc_data == TRUE)
test_ufc_subset <- fresh_model_dataset %>%
  subset(split_ufc_data == FALSE)
# Remove and save columns we don't need, but may need later... just in case.
saved_column_names <- c("FightNumber","Fighter1","Fighter2","Date")
column_pos <- c()
for (column in saved_column_names) {
  column_pos <- c(column_pos, grep(column, colnames(fresh_model_dataset)))
}
saved_columns <- fresh_model_dataset %>%
  select(column_pos)
train_ufc_subset <- train_ufc_subset %>% select(-column_pos)
test_ufc_subset <- test_ufc_subset %>% select(-column_pos)
####################################
######### LOGIT MODEL ##############
ufc_fresh_logit_model <- glm(F1Wins ~ ., data = na.omit(train_ufc_subset), family = binomial)
# Prediction - probability that F1 wins
test_ufc_subset$F1WinsProb <- round(predict(ufc_fresh_logit_model, newdata = test_ufc_subset, type = "response"), 4)
#########################################
########## THRESHOLD MODELS #############
WinThreshold <- 0.5
# Filter out fights to get only those where F1WinsProb > 0.6 or < 0.4
test_threshold_subset <- test_ufc_subset %>%
  filter(F1WinsProb > WinThreshold | F1WinsProb < (1-WinThreshold))
test_threshold_subset$PredF1Wins <- as.numeric(test_threshold_subset$F1WinsProb > WinThreshold)
#######################################
######### LOGIT CONFUSION MATRIX #######
logitPerformance <- confusionMatrix(as.factor(test_threshold_subset$PredF1Wins), as.factor(test_threshold_subset$F1Wins),
                                    dnn = c("PredF1Wins", "F1Wins"))
logitPerformance
Confusion Matrix and Statistics

          F1Wins
PredF1Wins   0   1
         0 119  99
         1 237 479
                                          
               Accuracy : 0.6403          
                 95% CI : (0.6085, 0.6711)
    No Information Rate : 0.6188          
    P-Value [Acc > NIR] : 0.09412         
                                          
                  Kappa : 0.1761          
 Mcnemar's Test P-Value : 7.782e-14       
                                          
            Sensitivity : 0.3343          
            Specificity : 0.8287          
         Pos Pred Value : 0.5459          
         Neg Pred Value : 0.6690          
             Prevalence : 0.3812          
         Detection Rate : 0.1274          
   Detection Prevalence : 0.2334          
      Balanced Accuracy : 0.5815          
                                          
       'Positive' Class : 0               
                                          
roc(F1Wins ~ F1WinsProb, test_threshold_subset, auc = TRUE, plot = TRUE)

Call:
roc.formula(formula = F1Wins ~ F1WinsProb, data = test_threshold_subset,     auc = TRUE, plot = TRUE)

Data: F1WinsProb in 356 controls (F1Wins 0) < 578 cases (F1Wins 1).
Area under the curve: 0.6701


Random Forest


Testing

############################ RANDOM FOREST MODEL ###################################
#set.seed(388)
ufc_fresh_randomForest_model <- randomForest(as.factor(F1Wins) ~ ., data = na.omit(train_ufc_subset),
                                            importance=TRUE)
test_ufc_subset$F1WinsProb <- NULL
test_ufc_subset$RFF1WinsProb <- predict(ufc_fresh_randomForest_model, test_ufc_subset, type = "prob")[,2]
test_ufc_subset$RFPredF1Wins <- as.numeric(test_ufc_subset$RFF1WinsProb > 0.5)
randomForestPerformance <- confusionMatrix(as.factor(test_ufc_subset$RFPredF1Wins), as.factor(test_ufc_subset$F1Wins), dnn = c("F1Wins", "PredF1Wins"))
randomForestPerformance
Confusion Matrix and Statistics

      PredF1Wins
F1Wins   0   1
     0  91  77
     1 265 501
                                         
               Accuracy : 0.6338         
                 95% CI : (0.602, 0.6648)
    No Information Rate : 0.6188         
    P-Value [Acc > NIR] : 0.1817         
                                         
                  Kappa : 0.1362         
 Mcnemar's Test P-Value : <2e-16         
                                         
            Sensitivity : 0.25562        
            Specificity : 0.86678        
         Pos Pred Value : 0.54167        
         Neg Pred Value : 0.65405        
             Prevalence : 0.38116        
         Detection Rate : 0.09743        
   Detection Prevalence : 0.17987        
      Balanced Accuracy : 0.56120        
                                         
       'Positive' Class : 0              
                                         
roc(F1Wins ~ RFF1WinsProb, test_ufc_subset, auc = TRUE, plot = TRUE)

Call:
roc.formula(formula = F1Wins ~ RFF1WinsProb, data = test_ufc_subset,     auc = TRUE, plot = TRUE)

Data: RFF1WinsProb in 356 controls (F1Wins 0) < 578 cases (F1Wins 1).
Area under the curve: 0.6282

varImpPlot(ufc_fresh_randomForest_model, n.var = 10)


UFC Project Workflow

The practical application of this project is to generate predictions for UFC fights on a weekly basis in order to wager on the fights. There are four main steps in this workflow.

  1. Update FightMetric_fighters_df: After each fight event, existing fighter data changes and sometimes new fighters are introduced to UFC. So, we want to pull this new information into our dataset of fighters (FightMetric_fighters_df). Specifically, we do this by running two blocks of code labeled for FightMetric_fighters_df that can be found in the UFC.R script beginning at line ~250 and ~1022, respectively.

  2. Update fighter_ranking_df: Of course, after each event fighter’s win and lose, and so the fighter rankings change. There are actually two compenents to our fighter_ranking_df - we have historical rankings and current rankings. After each event we need to update the current rankings of each fighter. But the historical rankings will not change. These are recorded in the dataset on a quarterly basis. In other words, we have historical rankings for each weightclass for each quarter going back to Q1 of 2006. So updating of the historical rankings only occurs once a quarter when they are released. Theoretically this means our model could have a few cases of rankings that are slightly off, but as a practical matter quarterly rankings are probably pretty accurate as the vast majority of fighters do not fight more than once per quarter. We update the current rankings by running the block of code beginning at line ~1079 of UFCUpdater.R, and we update the historical rankings once a quarter by running the block of code at line ~1030 of UFCUpdater.R.

  3. Update ufc_fight_details_df: After each event there are new fights and detailed statistics to be added to our ufc_fight_details_df. While this dataset was originally built using the UFC.R script, it can be a lengthy and somewhat hands-on process to scrape nearly 10,000 web pages once, but to do so every week would be overly burdonsome. So, there is another script, UFCUpdater.R, that scrapes only the recently completed fights and data, computes their derivitive data, and then merges them back into ufc_fight_details_df. We simply run the UFCUpdator.R script to accomplish this.

  4. Generate predictions: Once we have updated our dataset, we are ready to generate predictions for the set of upcoming fights. To accomplish this we run the UFCPredictor.R script. This script does a few things. First, it scrapes the upcoming fight matchups. Then it computes and gathers the relevant variables for each fighter from our dataset. Then it refits the model with the most recent dataset. Finally, it applies our model to the set of data for the upcoming fights to produce a probability that Fighter 1 will win.


Roadmap For Improvement

The status of this project is both functional and practical, and it accomplishes what it sets out to do. Nonetheless, there is room for improvement. The following is a working list of items to explore for improving the performance of the model.

Feature Engineering
1. A feature that incorporates how a fight ends: For example, does a fighter win by KO/TKO, Submission, Decision, etc…
2. A feature that incorporates whether or not this is a fighter’s first fight in UFC.
3. A feature to incorporate a fighter’s current winning or losing streak.
4. Further testing of different variable combinations (removing variables that appear insignificant, for example)

Predictive Modeling
1. Explore implementing and Ensemble model of both Logistic Regression and Random Forest.
2. Explore clustering on sets of fights predicted incorrectly to see if we can develop rules to improve overall predictive accuracy with a multi-stage predictive approach… or a multi-faceted model (i.e. generate multiple fits for subsets of fights that meet certain criteria).
3. Explore Specificity, Precision, and other model performance measures as a way to weight or hone model predictions.

Usability
1. Automate dataset updates and prediction generation.
2. Build a user-friendly dashboard or interactive interface with Flexdashboard or Shiny, respectively.

Code Base
1. Implement testing suite to help avoid “breaking” an increasingly complex code base during future code maintenance and feature additions.
2. Plenty of room for refactoring, which will simplfy readability and collaboration, and may present some gains in execution speed.


Appendix

Data

# Creating table to explain the data.
data_table <- data.frame(colnames(ufc_fight_details_df),
                         rep(NA,length(ufc_fight_details_df)),
                         as.character(ufc_fight_details_df[3449,]),
                         rep("FightMetric",length(ufc_fight_details_df)),
                         stringsAsFactors = FALSE)
# Assign column names.
colnames(data_table) <- c("Variables", "Unabreviated", "Example", "Source")
# Generate unabreviated variable names.
data_table$Unabreviated <- data_table$Variables %>%
  gsub("F1", "Fighter1 ", .) %>%
  gsub("F2", "Fighter2 ", .) %>%
  gsub("Str", "Strike ", .) %>%
  gsub("Att", "Attempt ", .) %>%
  gsub("Car", "Career ", .) %>%
  gsub("Sig", "Significant ", .) %>%
  gsub("Opp", "Opponent ", .) %>%
  gsub("TD", "Take Down ", .) %>%
  gsub("Abs", "Absorbed ", .) %>%
  gsub("pM", "per Minute ", .) %>%
  gsub("Acc", "Accuracy ", .) %>%
  gsub("Def", "Defense ", .) %>%
  gsub("Avg", "Average ", .) %>%
  gsub("Dist", "Distance ", .) %>%
  gsub("KD", "Knock Down ", .) %>%
  gsub("Pass", "Guard Pass ", .) %>%
  gsub("NC", "No Contest ", .) %>%
  gsub("R1", "Round 1 ", .) %>%
  gsub("R2", "Round 2 ", .) %>%
  gsub("R3", "Round 3 ", .) %>%
  gsub("R4", "Round 4 ", .) %>%
  gsub("R5", "Round 5 ", .) %>%
  gsub("Tot", "Total ", .) %>%
  gsub("Rev", "Reversal", .) %>%
  gsub("SA", "Strikes Absorbed ", .) %>%
  gsub("Sub", "Submission ", .) %>%
  gsub("Ref", "Referee ", .) %>%
  gsub("SL", "Strikes Landed ", .)
# Insert data sources
self_derived <- grep(".*Car.*", data_table$Variables)
fightmatrix <- grep("F1Rank|F2Rank", data_table$Variables)
data_table$Source[self_derived] <- "Derived"
data_table$Source[fightmatrix] <- "FightMatrix"
kable(data_table, caption = "Composition of the primary datset, [ufc_fight_details_df]", align = 'l')
Variables Unabreviated Example Source
Date Date 2009-03-07 FightMetric
Detail Detail Kick to Head At Distance FightMetric
F1Age Fighter1 Age 32 FightMetric
F1BodyStr Fighter1 BodyStrike 0 FightMetric
F1BodyStrAtt Fighter1 BodyStrike Attempt 0 FightMetric
F1CarFightTime Fighter1 Career FightTime 3790 Derived
F1CarOppSigStrAtt Fighter1 Career Opponent Significant Strike Attempt 540 Derived
F1CarOppTDAtt Fighter1 Career Opponent Take Down Attempt 5 Derived
F1CarSigStr Fighter1 Career Significant Strike 216 Derived
F1CarSigStrAbs Fighter1 Career Significant Strike Absorbed 215 Derived
F1CarSigStrAtt Fighter1 Career Significant Strike Attempt 505 Derived
F1CarStrAbspM Fighter1 Career Strike Absorbed per Minute 3.4 Derived
F1CarStrAcc Fighter1 Career Strike Accuracy 0.43 Derived
F1CarStrDef Fighter1 Career Strike Defense 0.4 Derived
F1CarStrLpM Fighter1 Career Strike Lper Minute 3.42 Derived
F1CarTD Fighter1 Career Take Down 17 Derived
F1CarTDAbs Fighter1 Career Take Down Absorbed 0 Derived
F1CarTDAcc Fighter1 Career Take Down Accuracy 0.52 Derived
F1CarTDAtt Fighter1 Career Take Down Attempt 33 Derived
F1CarTDAvg Fighter1 Career Take Down Average 2.43 Derived
F1CarTDDef Fighter1 Career Take Down Defense 0 Derived
F1ClinchStr Fighter1 ClinchStrike 2 FightMetric
F1ClinchStrAtt Fighter1 ClinchStrike Attempt 4 FightMetric
F1DistStr Fighter1 Distance Strike 20 FightMetric
F1DistStrAtt Fighter1 Distance Strike Attempt 42 FightMetric
F1DOB Fighter1 DOB 1976-10-05 FightMetric
F1GroundStr Fighter1 GroundStrike 1 FightMetric
F1GroundStrAtt Fighter1 GroundStrike Attempt 1 FightMetric
F1HeadStr Fighter1 HeadStrike 18 FightMetric
F1HeadStrAtt Fighter1 HeadStrike Attempt 42 FightMetric
F1Height Fighter1 Height 73 FightMetric
F1KD Fighter1 Knock Down 1 FightMetric
F1LegStr Fighter1 LegStrike 5 FightMetric
F1LegStrAtt Fighter1 LegStrike Attempt 5 FightMetric
F1Pass Fighter1 Guard Pass 0 FightMetric
F1ProLosses Fighter1 ProLosses 5 FightMetric
F1ProNC Fighter1 ProNo Contest 0 FightMetric
F1ProWins Fighter1 ProWins 12 FightMetric
F1R1BodyStr Fighter1 Round 1 BodyStrike 0 FightMetric
F1R1BodyStrAtt Fighter1 Round 1 BodyStrike Attempt 0 FightMetric
F1R1ClinchStr Fighter1 Round 1 ClinchStrike 2 FightMetric
F1R1ClinchStrAtt Fighter1 Round 1 ClinchStrike Attempt 4 FightMetric
F1R1DistStr Fighter1 Round 1 Distance Strike 20 FightMetric
F1R1DistStrAtt Fighter1 Round 1 Distance Strike Attempt 42 FightMetric
F1R1GroundStr Fighter1 Round 1 GroundStrike 1 FightMetric
F1R1GroundStrAtt Fighter1 Round 1 GroundStrike Attempt 1 FightMetric
F1R1HeadStr Fighter1 Round 1 HeadStrike 18 FightMetric
F1R1HeadStrAtt Fighter1 Round 1 HeadStrike Attempt 42 FightMetric
F1R1KD Fighter1 Round 1 Knock Down 1 FightMetric
F1R1LegStr Fighter1 Round 1 LegStrike 5 FightMetric
F1R1LegStrAtt Fighter1 Round 1 LegStrike Attempt 5 FightMetric
F1R1Pass Fighter1 Round 1 Guard Pass 0 FightMetric
F1R1Rev Fighter1 Round 1 Reversal 0 FightMetric
F1R1SigStr Fighter1 Round 1 Significant Strike 23 FightMetric
F1R1SigStrAtt Fighter1 Round 1 Significant Strike Attempt 47 FightMetric
F1R1SigStrPercent Fighter1 Round 1 Significant Strike Percent 0.48 FightMetric
F1R1SubAtt Fighter1 Round 1 Submission Attempt 0 FightMetric
F1R1TD Fighter1 Round 1 Take Down 0 FightMetric
F1R1TDAtt Fighter1 Round 1 Take Down Attempt 0 FightMetric
F1R1TDPercent Fighter1 Round 1 Take Down Percent 0 FightMetric
F1R1TotStr Fighter1 Round 1 Total Strike 40 FightMetric
F1R1TotStrAtt Fighter1 Round 1 Total Strike Attempt 64 FightMetric
F1R2BodyStr Fighter1 Round 2 BodyStrike NA FightMetric
F1R2BodyStrAtt Fighter1 Round 2 BodyStrike Attempt NA FightMetric
F1R2ClinchStr Fighter1 Round 2 ClinchStrike NA FightMetric
F1R2ClinchStrAtt Fighter1 Round 2 ClinchStrike Attempt NA FightMetric
F1R2DistStr Fighter1 Round 2 Distance Strike NA FightMetric
F1R2DistStrAtt Fighter1 Round 2 Distance Strike Attempt NA FightMetric
F1R2GroundStr Fighter1 Round 2 GroundStrike NA FightMetric
F1R2GroundStrAtt Fighter1 Round 2 GroundStrike Attempt NA FightMetric
F1R2HeadStr Fighter1 Round 2 HeadStrike NA FightMetric
F1R2HeadStrAtt Fighter1 Round 2 HeadStrike Attempt NA FightMetric
F1R2KD Fighter1 Round 2 Knock Down NA FightMetric
F1R2LegStr Fighter1 Round 2 LegStrike NA FightMetric
F1R2LegStrAtt Fighter1 Round 2 LegStrike Attempt NA FightMetric
F1R2Pass Fighter1 Round 2 Guard Pass NA FightMetric
F1R2Rev Fighter1 Round 2 Reversal NA FightMetric
F1R2SigStr Fighter1 Round 2 Significant Strike NA FightMetric
F1R2SigStrAtt Fighter1 Round 2 Significant Strike Attempt NA FightMetric
F1R2SigStrPercent Fighter1 Round 2 Significant Strike Percent NA FightMetric
F1R2SubAtt Fighter1 Round 2 Submission Attempt NA FightMetric
F1R2TD Fighter1 Round 2 Take Down NA FightMetric
F1R2TDAtt Fighter1 Round 2 Take Down Attempt NA FightMetric
F1R2TDPercent Fighter1 Round 2 Take Down Percent NA FightMetric
F1R2TotStr Fighter1 Round 2 Total Strike NA FightMetric
F1R2TotStrAtt Fighter1 Round 2 Total Strike Attempt NA FightMetric
F1R3BodyStr Fighter1 Round 3 BodyStrike NA FightMetric
F1R3BodyStrAtt Fighter1 Round 3 BodyStrike Attempt NA FightMetric
F1R3ClinchStr Fighter1 Round 3 ClinchStrike NA FightMetric
F1R3ClinchStrAtt Fighter1 Round 3 ClinchStrike Attempt NA FightMetric
F1R3DistStr Fighter1 Round 3 Distance Strike NA FightMetric
F1R3DistStrAtt Fighter1 Round 3 Distance Strike Attempt NA FightMetric
F1R3GroundStr Fighter1 Round 3 GroundStrike NA FightMetric
F1R3GroundStrAtt Fighter1 Round 3 GroundStrike Attempt NA FightMetric
F1R3HeadStr Fighter1 Round 3 HeadStrike NA FightMetric
F1R3HeadStrAtt Fighter1 Round 3 HeadStrike Attempt NA FightMetric
F1R3KD Fighter1 Round 3 Knock Down NA FightMetric
F1R3LegStr Fighter1 Round 3 LegStrike NA FightMetric
F1R3LegStrAtt Fighter1 Round 3 LegStrike Attempt NA FightMetric
F1R3Pass Fighter1 Round 3 Guard Pass NA FightMetric
F1R3Rev Fighter1 Round 3 Reversal NA FightMetric
F1R3SigStr Fighter1 Round 3 Significant Strike NA FightMetric
F1R3SigStrAtt Fighter1 Round 3 Significant Strike Attempt NA FightMetric
F1R3SigStrPercent Fighter1 Round 3 Significant Strike Percent NA FightMetric
F1R3SubAtt Fighter1 Round 3 Submission Attempt NA FightMetric
F1R3TD Fighter1 Round 3 Take Down NA FightMetric
F1R3TDAtt Fighter1 Round 3 Take Down Attempt NA FightMetric
F1R3TDPercent Fighter1 Round 3 Take Down Percent NA FightMetric
F1R3TotStr Fighter1 Round 3 Total Strike NA FightMetric
F1R3TotStrAtt Fighter1 Round 3 Total Strike Attempt NA FightMetric
F1R4BodyStr Fighter1 Round 4 BodyStrike NA FightMetric
F1R4BodyStrAtt Fighter1 Round 4 BodyStrike Attempt NA FightMetric
F1R4ClinchStr Fighter1 Round 4 ClinchStrike NA FightMetric
F1R4ClinchStrAtt Fighter1 Round 4 ClinchStrike Attempt NA FightMetric
F1R4DistStr Fighter1 Round 4 Distance Strike NA FightMetric
F1R4DistStrAtt Fighter1 Round 4 Distance Strike Attempt NA FightMetric
F1R4GroundStr Fighter1 Round 4 GroundStrike NA FightMetric
F1R4GroundStrAtt Fighter1 Round 4 GroundStrike Attempt NA FightMetric
F1R4HeadStr Fighter1 Round 4 HeadStrike NA FightMetric
F1R4HeadStrAtt Fighter1 Round 4 HeadStrike Attempt NA FightMetric
F1R4KD Fighter1 Round 4 Knock Down NA FightMetric
F1R4LegStr Fighter1 Round 4 LegStrike NA FightMetric
F1R4LegStrAtt Fighter1 Round 4 LegStrike Attempt NA FightMetric
F1R4Pass Fighter1 Round 4 Guard Pass NA FightMetric
F1R4Rev Fighter1 Round 4 Reversal NA FightMetric
F1R4SigStr Fighter1 Round 4 Significant Strike NA FightMetric
F1R4SigStrAtt Fighter1 Round 4 Significant Strike Attempt NA FightMetric
F1R4SigStrPercent Fighter1 Round 4 Significant Strike Percent NA FightMetric
F1R4SubAtt Fighter1 Round 4 Submission Attempt NA FightMetric
F1R4TD Fighter1 Round 4 Take Down NA FightMetric
F1R4TDAtt Fighter1 Round 4 Take Down Attempt NA FightMetric
F1R4TDPercent Fighter1 Round 4 Take Down Percent NA FightMetric
F1R4TotStr Fighter1 Round 4 Total Strike NA FightMetric
F1R4TotStrAtt Fighter1 Round 4 Total Strike Attempt NA FightMetric
F1R5BodyStr Fighter1 Round 5 BodyStrike NA FightMetric
F1R5BodyStrAtt Fighter1 Round 5 BodyStrike Attempt NA FightMetric
F1R5ClinchStr Fighter1 Round 5 ClinchStrike NA FightMetric
F1R5ClinchStrAtt Fighter1 Round 5 ClinchStrike Attempt NA FightMetric
F1R5DistStr Fighter1 Round 5 Distance Strike NA FightMetric
F1R5DistStrAtt Fighter1 Round 5 Distance Strike Attempt NA FightMetric
F1R5GroundStr Fighter1 Round 5 GroundStrike NA FightMetric
F1R5GroundStrAtt Fighter1 Round 5 GroundStrike Attempt NA FightMetric
F1R5HeadStr Fighter1 Round 5 HeadStrike NA FightMetric
F1R5HeadStrAtt Fighter1 Round 5 HeadStrike Attempt NA FightMetric
F1R5KD Fighter1 Round 5 Knock Down NA FightMetric
F1R5LegStr Fighter1 Round 5 LegStrike NA FightMetric
F1R5LegStrAtt Fighter1 Round 5 LegStrike Attempt NA FightMetric
F1R5Pass Fighter1 Round 5 Guard Pass NA FightMetric
F1R5Rev Fighter1 Round 5 Reversal NA FightMetric
F1R5SigStr Fighter1 Round 5 Significant Strike NA FightMetric
F1R5SigStrAtt Fighter1 Round 5 Significant Strike Attempt NA FightMetric
F1R5SigStrPercent Fighter1 Round 5 Significant Strike Percent NA FightMetric
F1R5SubAtt Fighter1 Round 5 Submission Attempt NA FightMetric
F1R5TD Fighter1 Round 5 Take Down NA FightMetric
F1R5TDAtt Fighter1 Round 5 Take Down Attempt NA FightMetric
F1R5TDPercent Fighter1 Round 5 Take Down Percent NA FightMetric
F1R5TotStr Fighter1 Round 5 Total Strike NA FightMetric
F1R5TotStrAtt Fighter1 Round 5 Total Strike Attempt NA FightMetric
F1Rank Fighter1 Rank 20 FightMatrix
F1Reach Fighter1 Reach 76 FightMetric
F1Rev Fighter1 Reversal 0 FightMetric
F1SApM Fighter1 Strikes Absorbed per Minute 3.8 FightMetric
F1SigStr Fighter1 Significant Strike 23 FightMetric
F1SigStrAtt Fighter1 Significant Strike Attempt 47 FightMetric
F1SigStrPercent Fighter1 Significant Strike Percent 0.48 FightMetric
F1SLpM Fighter1 Strikes Landed per Minute 3.69 FightMetric
F1Stance Fighter1 Stance Orthodox FightMetric
F1StrAcc Fighter1 Strike Accuracy 0.42 FightMetric
F1StrDef Fighter1 Strike Defense 0.61 FightMetric
F1SubAtt Fighter1 Submission Attempt 0 FightMetric
F1SubAttAvg Fighter1 Submission Attempt Average 0 FightMetric
F1TD Fighter1 Take Down 0 FightMetric
F1TDAcc Fighter1 Take Down Accuracy 0.34 FightMetric
F1TDAtt Fighter1 Take Down Attempt 0 FightMetric
F1TDAvg Fighter1 Take Down Average 3 FightMetric
F1TDDef Fighter1 Take Down Defense 0.91 FightMetric
F1TDPercent Fighter1 Take Down Percent 0 FightMetric
F1TotStr Fighter1 Total Strike 40 FightMetric
F1TotStrAtt Fighter1 Total Strike Attempt 64 FightMetric
F1UFCLosses Fighter1 UFCLosses 2 FightMetric
F1UFCNC Fighter1 UFCNo Contest 0 FightMetric
F1UFCWins Fighter1 UFCWins 5 FightMetric
F1Weight Fighter1 Weight 205 FightMetric
F1Wins Fighter1 Wins 1 FightMetric
F2Age Fighter2 Age 31 FightMetric
F2BodyStr Fighter2 BodyStrike 3 FightMetric
F2BodyStrAtt Fighter2 BodyStrike Attempt 5 FightMetric
F2CarFightTime Fighter2 Career FightTime 401 Derived
F2CarOppSigStrAtt Fighter2 Career Opponent Significant Strike Attempt 20 Derived
F2CarOppTDAtt Fighter2 Career Opponent Take Down Attempt 1 Derived
F2CarSigStr Fighter2 Career Significant Strike 34 Derived
F2CarSigStrAbs Fighter2 Career Significant Strike Absorbed 3 Derived
F2CarSigStrAtt Fighter2 Career Significant Strike Attempt 55 Derived
F2CarStrAbspM Fighter2 Career Strike Absorbed per Minute 0.45 Derived
F2CarStrAcc Fighter2 Career Strike Accuracy 0.62 Derived
F2CarStrDef Fighter2 Career Strike Defense 0.15 Derived
F2CarStrLpM Fighter2 Career Strike Lper Minute 5.09 Derived
F2CarTD Fighter2 Career Take Down 1 Derived
F2CarTDAbs Fighter2 Career Take Down Absorbed 0 Derived
F2CarTDAcc Fighter2 Career Take Down Accuracy 0.2 Derived
F2CarTDAtt Fighter2 Career Take Down Attempt 5 Derived
F2CarTDAvg Fighter2 Career Take Down Average 0.5 Derived
F2CarTDDef Fighter2 Career Take Down Defense 0 Derived
F2ClinchStr Fighter2 ClinchStrike 4 FightMetric
F2ClinchStrAtt Fighter2 ClinchStrike Attempt 6 FightMetric
F2DistStr Fighter2 Distance Strike 10 FightMetric
F2DistStrAtt Fighter2 Distance Strike Attempt 31 FightMetric
F2DOB Fighter2 DOB 1978-02-09 FightMetric
F2GroundStr Fighter2 GroundStrike 0 FightMetric
F2GroundStrAtt Fighter2 GroundStrike Attempt 0 FightMetric
F2HeadStr Fighter2 HeadStrike 11 FightMetric
F2HeadStrAtt Fighter2 HeadStrike Attempt 32 FightMetric
F2Height Fighter2 Height 72 FightMetric
F2KD Fighter2 Knock Down 0 FightMetric
F2LegStr Fighter2 LegStrike 0 FightMetric
F2LegStrAtt Fighter2 LegStrike Attempt 0 FightMetric
F2Pass Fighter2 Guard Pass 0 FightMetric
F2ProLosses Fighter2 ProLosses 6 FightMetric
F2ProNC Fighter2 ProNo Contest 0 FightMetric
F2ProWins Fighter2 ProWins 14 FightMetric
F2R1BodyStr Fighter2 Round 1 BodyStrike 3 FightMetric
F2R1BodyStrAtt Fighter2 Round 1 BodyStrike Attempt 5 FightMetric
F2R1ClinchStr Fighter2 Round 1 ClinchStrike 4 FightMetric
F2R1ClinchStrAtt Fighter2 Round 1 ClinchStrike Attempt 6 FightMetric
F2R1DistStr Fighter2 Round 1 Distance Strike 10 FightMetric
F2R1DistStrAtt Fighter2 Round 1 Distance Strike Attempt 31 FightMetric
F2R1GroundStr Fighter2 Round 1 GroundStrike 0 FightMetric
F2R1GroundStrAtt Fighter2 Round 1 GroundStrike Attempt 0 FightMetric
F2R1HeadStr Fighter2 Round 1 HeadStrike 11 FightMetric
F2R1HeadStrAtt Fighter2 Round 1 HeadStrike Attempt 32 FightMetric
F2R1KD Fighter2 Round 1 Knock Down 0 FightMetric
F2R1LegStr Fighter2 Round 1 LegStrike 0 FightMetric
F2R1LegStrAtt Fighter2 Round 1 LegStrike Attempt 0 FightMetric
F2R1Pass Fighter2 Round 1 Guard Pass 0 FightMetric
F2R1Rev Fighter2 Round 1 Reversal 0 FightMetric
F2R1SigStr Fighter2 Round 1 Significant Strike 14 FightMetric
F2R1SigStrAtt Fighter2 Round 1 Significant Strike Attempt 37 FightMetric
F2R1SigStrPercent Fighter2 Round 1 Significant Strike Percent 0.37 FightMetric
F2R1SubAtt Fighter2 Round 1 Submission Attempt 0 FightMetric
F2R1TD Fighter2 Round 1 Take Down 0 FightMetric
F2R1TDAtt Fighter2 Round 1 Take Down Attempt 4 FightMetric
F2R1TDPercent Fighter2 Round 1 Take Down Percent 0 FightMetric
F2R1TotStr Fighter2 Round 1 Total Strike 14 FightMetric
F2R1TotStrAtt Fighter2 Round 1 Total Strike Attempt 37 FightMetric
F2R2BodyStr Fighter2 Round 2 BodyStrike NA FightMetric
F2R2BodyStrAtt Fighter2 Round 2 BodyStrike Attempt NA FightMetric
F2R2ClinchStr Fighter2 Round 2 ClinchStrike NA FightMetric
F2R2ClinchStrAtt Fighter2 Round 2 ClinchStrike Attempt NA FightMetric
F2R2DistStr Fighter2 Round 2 Distance Strike NA FightMetric
F2R2DistStrAtt Fighter2 Round 2 Distance Strike Attempt NA FightMetric
F2R2GroundStr Fighter2 Round 2 GroundStrike NA FightMetric
F2R2GroundStrAtt Fighter2 Round 2 GroundStrike Attempt NA FightMetric
F2R2HeadStr Fighter2 Round 2 HeadStrike NA FightMetric
F2R2HeadStrAtt Fighter2 Round 2 HeadStrike Attempt NA FightMetric
F2R2KD Fighter2 Round 2 Knock Down NA FightMetric
F2R2LegStr Fighter2 Round 2 LegStrike NA FightMetric
F2R2LegStrAtt Fighter2 Round 2 LegStrike Attempt NA FightMetric
F2R2Pass Fighter2 Round 2 Guard Pass NA FightMetric
F2R2Rev Fighter2 Round 2 Reversal NA FightMetric
F2R2SigStr Fighter2 Round 2 Significant Strike NA FightMetric
F2R2SigStrAtt Fighter2 Round 2 Significant Strike Attempt NA FightMetric
F2R2SigStrPercent Fighter2 Round 2 Significant Strike Percent NA FightMetric
F2R2SubAtt Fighter2 Round 2 Submission Attempt NA FightMetric
F2R2TD Fighter2 Round 2 Take Down NA FightMetric
F2R2TDAtt Fighter2 Round 2 Take Down Attempt NA FightMetric
F2R2TDPercent Fighter2 Round 2 Take Down Percent NA FightMetric
F2R2TotStr Fighter2 Round 2 Total Strike NA FightMetric
F2R2TotStrAtt Fighter2 Round 2 Total Strike Attempt NA FightMetric
F2R3BodyStr Fighter2 Round 3 BodyStrike NA FightMetric
F2R3BodyStrAtt Fighter2 Round 3 BodyStrike Attempt NA FightMetric
F2R3ClinchStr Fighter2 Round 3 ClinchStrike NA FightMetric
F2R3ClinchStrAtt Fighter2 Round 3 ClinchStrike Attempt NA FightMetric
F2R3DistStr Fighter2 Round 3 Distance Strike NA FightMetric
F2R3DistStrAtt Fighter2 Round 3 Distance Strike Attempt NA FightMetric
F2R3GroundStr Fighter2 Round 3 GroundStrike NA FightMetric
F2R3GroundStrAtt Fighter2 Round 3 GroundStrike Attempt NA FightMetric
F2R3HeadStr Fighter2 Round 3 HeadStrike NA FightMetric
F2R3HeadStrAtt Fighter2 Round 3 HeadStrike Attempt NA FightMetric
F2R3KD Fighter2 Round 3 Knock Down NA FightMetric
F2R3LegStr Fighter2 Round 3 LegStrike NA FightMetric
F2R3LegStrAtt Fighter2 Round 3 LegStrike Attempt NA FightMetric
F2R3Pass Fighter2 Round 3 Guard Pass NA FightMetric
F2R3Rev Fighter2 Round 3 Reversal NA FightMetric
F2R3SigStr Fighter2 Round 3 Significant Strike NA FightMetric
F2R3SigStrAtt Fighter2 Round 3 Significant Strike Attempt NA FightMetric
F2R3SigStrPercent Fighter2 Round 3 Significant Strike Percent NA FightMetric
F2R3SubAtt Fighter2 Round 3 Submission Attempt NA FightMetric
F2R3TD Fighter2 Round 3 Take Down NA FightMetric
F2R3TDAtt Fighter2 Round 3 Take Down Attempt NA FightMetric
F2R3TDPercent Fighter2 Round 3 Take Down Percent NA FightMetric
F2R3TotStr Fighter2 Round 3 Total Strike NA FightMetric
F2R3TotStrAtt Fighter2 Round 3 Total Strike Attempt NA FightMetric
F2R4BodyStr Fighter2 Round 4 BodyStrike NA FightMetric
F2R4BodyStrAtt Fighter2 Round 4 BodyStrike Attempt NA FightMetric
F2R4ClinchStr Fighter2 Round 4 ClinchStrike NA FightMetric
F2R4ClinchStrAtt Fighter2 Round 4 ClinchStrike Attempt NA FightMetric
F2R4DistStr Fighter2 Round 4 Distance Strike NA FightMetric
F2R4DistStrAtt Fighter2 Round 4 Distance Strike Attempt NA FightMetric
F2R4GroundStr Fighter2 Round 4 GroundStrike NA FightMetric
F2R4GroundStrAtt Fighter2 Round 4 GroundStrike Attempt NA FightMetric
F2R4HeadStr Fighter2 Round 4 HeadStrike NA FightMetric
F2R4HeadStrAtt Fighter2 Round 4 HeadStrike Attempt NA FightMetric
F2R4KD Fighter2 Round 4 Knock Down NA FightMetric
F2R4LegStr Fighter2 Round 4 LegStrike NA FightMetric
F2R4LegStrAtt Fighter2 Round 4 LegStrike Attempt NA FightMetric
F2R4Pass Fighter2 Round 4 Guard Pass NA FightMetric
F2R4Rev Fighter2 Round 4 Reversal NA FightMetric
F2R4SigStr Fighter2 Round 4 Significant Strike NA FightMetric
F2R4SigStrAtt Fighter2 Round 4 Significant Strike Attempt NA FightMetric
F2R4SigStrPercent Fighter2 Round 4 Significant Strike Percent NA FightMetric
F2R4SubAtt Fighter2 Round 4 Submission Attempt NA FightMetric
F2R4TD Fighter2 Round 4 Take Down NA FightMetric
F2R4TDAtt Fighter2 Round 4 Take Down Attempt NA FightMetric
F2R4TDPercent Fighter2 Round 4 Take Down Percent NA FightMetric
F2R4TotStr Fighter2 Round 4 Total Strike NA FightMetric
F2R4TotStrAtt Fighter2 Round 4 Total Strike Attempt NA FightMetric
F2R5BodyStr Fighter2 Round 5 BodyStrike NA FightMetric
F2R5BodyStrAtt Fighter2 Round 5 BodyStrike Attempt NA FightMetric
F2R5ClinchStr Fighter2 Round 5 ClinchStrike NA FightMetric
F2R5ClinchStrAtt Fighter2 Round 5 ClinchStrike Attempt NA FightMetric
F2R5DistStr Fighter2 Round 5 Distance Strike NA FightMetric
F2R5DistStrAtt Fighter2 Round 5 Distance Strike Attempt NA FightMetric
F2R5GroundStr Fighter2 Round 5 GroundStrike NA FightMetric
F2R5GroundStrAtt Fighter2 Round 5 GroundStrike Attempt NA FightMetric
F2R5HeadStr Fighter2 Round 5 HeadStrike NA FightMetric
F2R5HeadStrAtt Fighter2 Round 5 HeadStrike Attempt NA FightMetric
F2R5KD Fighter2 Round 5 Knock Down NA FightMetric
F2R5LegStr Fighter2 Round 5 LegStrike NA FightMetric
F2R5LegStrAtt Fighter2 Round 5 LegStrike Attempt NA FightMetric
F2R5Pass Fighter2 Round 5 Guard Pass NA FightMetric
F2R5Rev Fighter2 Round 5 Reversal NA FightMetric
F2R5SigStr Fighter2 Round 5 Significant Strike NA FightMetric
F2R5SigStrAtt Fighter2 Round 5 Significant Strike Attempt NA FightMetric
F2R5SigStrPercent Fighter2 Round 5 Significant Strike Percent NA FightMetric
F2R5SubAtt Fighter2 Round 5 Submission Attempt NA FightMetric
F2R5TD Fighter2 Round 5 Take Down NA FightMetric
F2R5TDAtt Fighter2 Round 5 Take Down Attempt NA FightMetric
F2R5TDPercent Fighter2 Round 5 Take Down Percent NA FightMetric
F2R5TotStr Fighter2 Round 5 Total Strike NA FightMetric
F2R5TotStrAtt Fighter2 Round 5 Total Strike Attempt NA FightMetric
F2Rank Fighter2 Rank 51 FightMatrix
F2Reach Fighter2 Reach 72 FightMetric
F2Rev Fighter2 Reversal 0 FightMetric
F2SApM Fighter2 Strikes Absorbed per Minute 2.26 FightMetric
F2SigStr Fighter2 Significant Strike 14 FightMetric
F2SigStrAtt Fighter2 Significant Strike Attempt 37 FightMetric
F2SigStrPercent Fighter2 Significant Strike Percent 0.37 FightMetric
F2SLpM Fighter2 Strikes Landed per Minute 3.12 FightMetric
F2Stance Fighter2 Stance Orthodox FightMetric
F2StrAcc Fighter2 Strike Accuracy 0.51 FightMetric
F2StrDef Fighter2 Strike Defense 0.57 FightMetric
F2SubAtt Fighter2 Submission Attempt 0 FightMetric
F2SubAttAvg Fighter2 Submission Attempt Average 0.6 FightMetric
F2TD Fighter2 Take Down 0 FightMetric
F2TDAcc Fighter2 Take Down Accuracy 0.29 FightMetric
F2TDAtt Fighter2 Take Down Attempt 4 FightMetric
F2TDAvg Fighter2 Take Down Average 3.17 FightMetric
F2TDDef Fighter2 Take Down Defense 0.57 FightMetric
F2TDPercent Fighter2 Take Down Percent 0 FightMetric
F2TotStr Fighter2 Total Strike 14 FightMetric
F2TotStrAtt Fighter2 Total Strike Attempt 37 FightMetric
F2UFCLosses Fighter2 UFCLosses 0 FightMetric
F2UFCNC Fighter2 UFCNo Contest 0 FightMetric
F2UFCWins Fighter2 UFCWins 2 FightMetric
F2Weight Fighter2 Weight 185 FightMetric
Fighter1 Fighter1 Matt Hamill FightMetric
Fighter2 Fighter2 Mark Munoz FightMetric
FightNumber FightNumber 3449 FightMetric
Gender Gender Men FightMetric
Judge1 Judge1 NA FightMetric
Judge1Score Judge1Score NA FightMetric
Judge2 Judge2 NA FightMetric
Judge2Score Judge2Score NA FightMetric
Judge3 Judge3 NA FightMetric
Judge3Score Judge3Score NA FightMetric
Method Method KO/TKO FightMetric
Ref Referee Dan Miragliotta FightMetric
Result Result Matt Hamill FightMetric
Round Round 1 FightMetric
Time Time 3:53 FightMetric
TotFightTime Total FightTime 233 FightMetric
WeightClass WeightClass Light Heavyweight FightMetric
# Creating table to explain the data.
data_table_fighters <- data.frame(colnames(fightMetric_fighters_df),
                         rep(NA,length(fightMetric_fighters_df)),
                         as.character(fightMetric_fighters_df[fightMetric_fighters_df$Name=="Gegard Mousasi",]),
                         rep("FightMetric",length(fightMetric_fighters_df)),
                         stringsAsFactors = FALSE)
# Assign column names.
colnames(data_table_fighters) <- c("Variables", "Unabreviated", "Example", "Source")
# Generate unabreviated variable names.
data_table_fighters$Unabreviated <- data_table_fighters$Variables %>%
  gsub("Str", "Strike ", .) %>%
  gsub("SL", "Strikes Landed ", .) %>%
  gsub("pM", "per Minute ", .) %>%
  gsub("Acc", "Accuracy ", .) %>%
  gsub("Def", "Defense ", .) %>%
  gsub("Avg", "Average ", .) %>%
  gsub("SA", "Strikes Absorbed", .) %>%
  gsub("Sub", "Submission ", .) %>%
  gsub("Att", "Attempt ", .) %>%
  gsub("TD", "Take Down ", .)
kable(data_table_fighters, caption = "Composition of the primary datset, [ufc_fight_details_df]", align = 'l')
Variables Unabreviated Example Source
Name Name Gegard Mousasi FightMetric
ProWins ProWins 39 FightMetric
ProLosses ProLosses 6 FightMetric
ProNC ProNC 2 FightMetric
Height Height 74 FightMetric
Weight Weight 185 FightMetric
Reach Reach 76 FightMetric
Stance Stance Orthodox FightMetric
DOB DOB 1985-08-01 FightMetric
SLpM Strikes Landed per Minute 3.56 FightMetric
StrAcc Strike Accuracy 0.5 FightMetric
SApM Strikes Absorbedper Minute 1.18 FightMetric
StrDef Strike Defense 0.69 FightMetric
TDAvg Take Down Average 1.6 FightMetric
TDAcc Take Down Accuracy 0.65 FightMetric
TDDef Take Down Defense 0.61 FightMetric
SubAttAvg Submission Attempt Average 1.2 FightMetric
Age Age 31 FightMetric
---
title: "Predicting UFC Outcomes"
output:
  html_notebook:
    output_dir: /Users/lw2134/Desktop/DataSci/UFC
    output_file: index.html
    smart: yes
    theme: united
    toc: yes
    toc_float: yes
  html_document:
    toc: yes
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, tidy = TRUE, warning = FALSE)
library(jsonlite)
library(httr)
library(dplyr)
library(stringr)
library(tidyr)
library(scales)
library(knitr)
library(rvest)
library(caTools)
library(RColorBrewer)
library(rpart)
library(party)
library(randomForest)
library(mlogit)
library(caret)
library(lubridate)
library(rmarkdown)
library(survey)
library(pROC)
library(aod)
```

```{r, include=FALSE}
# Import datasets
# ufc_fight_details_df
ufc_fight_details_df <- read.csv(paste(getwd(), "/ufc_fight_details.csv", sep = ""),
                                 header = TRUE, stringsAsFactors = FALSE)

ufc_fight_details_df$X <- NULL

# fightMetric_fighters_df
fightMetric_fighters_df <- read.csv(paste(getwd(), "/fightMetric_fighters.csv", sep = ""),
                                 header = TRUE, stringsAsFactors = FALSE)

fightMetric_fighters_df$X <- NULL

# fighter_ranking_df
fighter_ranking_df <- read.csv(paste(getwd(), "/fighter_ranking.csv", sep = ""),
                                    header = TRUE, stringsAsFactors = FALSE)

fighter_ranking_df$X <- NULL
```


## Introduction
Ultimate Fighting Championship (UFC) is the world’s premier professional mixed martial arts
organization. Fighters engage in one-on-one matches consisting of 3-5 rounds of 5 minutes each.
Fighters are allowed to strike with punches, kicks, knees, elbows, or employ submission moves such as
chokes or joint locks. Fighting occurs standing up, on the ground, or in a clinch position. Much like
other professional sports, there are a variety of structured data gathered on individual fighters and on
each fight match up. This project gathers much of that data, learns from it, and then makes predictions
about future fight matchups. Specifically, this project seeks to answer the question: Given a specific
mathcup, what is each fighter's probability of winning?

The practical application of this project (though there are potentially others) is to wager on fights
and make money. In fact, the model's predictions have been applied to real world future fights for 5 weeks now, totalling 40 fights, and has a 75% accuracy at a probability threshold greater than or equal to 50%. The accuracy increases accordingly with the probability threshold. The wagering track record over these past 5 weeks at the 50% threshold has been ~120% growth. In other words if you started with a pool of $86,000 (an amount that would let you put $10K on each fight) on September 1st, then by October 2nd you would have $175,000. The ROI is determined by the odds for the indiviual fights, so it is somewhat independant of the overall predicive accuracy. The return is based on $10K wagers on 40 fights over a 5 week period.


***
## Wrangling
They say data wrangling makes up 80%-90% of the time spent on a data science project, and that expectation continued to hold true for this one.  The dataset ultimately developed by and for this project is not only an aggregation of data scraped from multiple sources, but is also comprised of newly generated data derived from the scraped data. Namely, the derivitive data refers to historic career statistics for each fighter at the time of each fight.

***

### Aggregating The Data
This project intially gathered data from four sources:  

- [UFC Unofficial API](http://ufc-data-api.ufc.com/api/v1/us/fighters): list of current UFC fighters
- [Wikipedia](https://en.wikipedia.org/wiki/List_of_UFC_events): list of UFC events and event information  
- [FightMetric](http://www.fightmetric.com/): detailed fight-by-fight statistics and fighter statistics for every UFC fight  
- [FightMatrix](http://www.fightmatrix.com/): proprietary current and historical fighter ranking system


However, in the end the only sources of data that were used in the final dataset were FightMetric and FightMatrix.  The process of scraping data generally followed the pattern below using the Rvest package.  



**1. Collect and store relevant URLs in a list.**
```{r, eval=FALSE, echo=TRUE}
# Collect individual fighter urls.
for (letter in letters) {
  fightMetric_fighter_list_url <- paste("http://www.fightmetric.com/statistics/fighters?char=",
                                        letter, "&page=all", sep = "")
  
  fightMetric_fighter_list_page <- read_html(fightMetric_fighter_list_url)
  
  # Get inidividual FightMetric Fighter links for letter of alphabet from Fighter List page.
  fighter_links_per_letter <- fightMetric_fighter_list_page %>%
    html_nodes(".b-statistics__table-col:nth-child(1) .b-link_style_black") %>%
    html_attr("href")
  
  fightMetric_fighter_links <- c(fightMetric_fighter_links, fighter_links_per_letter)
}
```



**2. Make copies of HTML from the URL list and store in a list.**
```{r, eval=FALSE, echo=TRUE}
#### Make copies of each FightMetric fighter page HTML to later scrape data we need. ####
fightMetric_fighter_html <- list()

for (i in 1:length(fightMetric_fighter_links)) {
  fightMetric_fighter_html[[i]] <- read_html(fightMetric_fighter_links[[i]])
}
```



**3. Parse the HTML and put what we want in a dataframe.**
```{r, eval=FALSE, echo=TRUE}
#### Populate REVISED FIGHTER DATA ####
for (i in 1:length(fightMetric_fighter_html)) {
  
  # Get fighter name
  name <- fightMetric_fighter_html[[i]] %>%
    html_nodes(".b-content__title-highlight") %>%
    html_text(trim = TRUE)
  
  fightMetric_fighters_df[i, 1] <- name[[1]]
  
  # Get fighter pro record
  record <- fightMetric_fighter_html[[i]] %>%
    html_nodes(".b-content__title-record") %>%
    html_text(trim = TRUE) %>%
    strsplit(" ")
  
  record <- record[[1]][2] %>%
    strsplit("-")
  
  fightMetric_fighters_df[i, 2] <- record[[1]][1]
  
  fightMetric_fighters_df[i, 3] <- record[[1]][2]
  
  fightMetric_fighters_df[i, 4] <- record[[1]][3]
  
  # Get fighter stats
  stats <- fightMetric_fighter_html[[i]] %>%
    html_nodes(".b-list__box-list-item_type_block") %>%
    html_text(trim = TRUE) %>%
    gsub("^.*:(\\n *)*", "", .)
  
  stats <- stats[-10]
  
  for (s in 1:length(stats)) {
    if (s == "") {
      fightMetric_fighters_df[i, (4+s)] <- NA
    } else {
      fightMetric_fighters_df[i, (4+s)] <- stats[s]
    }
  }
}
```



This process was adapted and repeated to pull data for each of the three primary dataframes:  

- `ufc_fight_details_df`: fight-by-fight statistics  
- `fightMetric_fighters_df`: individual fighter statistics  
- `figher_ranking_df`: current and historical fighter rankings by weightclass

***
### Cleaning The Data
Much of the data that was scraped had to be cleaned to some extent.  At the very least values were converted from character to numeric, and in other cases fixes such as removing '%' signs, converting a fighter's height into number of inches, converting Dates of birth into machine readable Date format, etc... were necessary. Here are a few examples:



**Removing '%' sign.**
```{r, eval=FALSE, echo=TRUE}
# Remove % sign in several of the columns and convert to numeric.
for (i in c(11,13,15,16)) {
  fightMetric_fighters_df[,i] <- fightMetric_fighters_df[,i] %>%
    gsub("%", "", .) %>%
    as.numeric()
  
  fightMetric_fighters_df[,i] <- fightMetric_fighters_df[,i]/100
}
```



**Converting height into number of inches**
```{r, eval=FALSE, echo=TRUE}
# Convert Height to numeric inches.
fightMetric_fighters_df$Height <- fightMetric_fighters_df$Height %>%
  gsub("'|\"", "", .) %>%
  strsplit(split = " ")

for (h in 1:length(fightMetric_fighters_df$Height)) {
  if (fightMetric_fighters_df[h, 5][[1]] == "--") {
    fightMetric_fighters_df[h, 5][[1]] <- NA
  } else {
    fightMetric_fighters_df[h, 5][[1]] <- (as.numeric(fightMetric_fighters_df[h, 5][[1]][1])*12) +
                                     as.numeric(fightMetric_fighters_df[h, 5][[1]][2])
  }
}

fightMetric_fighters_df$Height <- unlist(fightMetric_fighters_df$Height)
```



There were also instances of UFC fighters with the same name, which was particulary troublesome when merging this fighter data with other dataframes on the fighter's name.  It gave the appearance of a dataset or a merge that was entirely accurate, but for a few inexplicable errors.  So after further digging into the datasets and crosschecking with the original webpages, the issue was handled by identifying these fighters by their name in addition to a physical attribute, and then renaming one of them.  Here's an example.

```{r, eval=FALSE, echo=TRUE}
# Handle instances of same name in FIGHTMETRIC_FIGHTERS_DF
fightMetric_fighters_df[fightMetric_fighters_df$Name=="Dong Hyun Kim" & fightMetric_fighters_df$Reach==70, 1] <- "Dong Hyun Kim (Maestro)"
fightMetric_fighters_df[fightMetric_fighters_df$Name=="Michael McDonald" & fightMetric_fighters_df$Weight==205, 1] <- "Michael McDonald (The Black Sniper)"
fightMetric_fighters_df[fightMetric_fighters_df$Name=="Tony Johnson" & fightMetric_fighters_df$Weight==185, 1] <- "Tony Johnson (185lbs.)"

```


***
### Shaping The Data
After all the data had been collected and stored in dataframes, there was some reshaping required in order to end up with a single dataframe suitable to subset from and present to a logistic regression or random forest function for modeling.  At this point there were three dataframes - `fight_details`, `fightMetric_fighters`, and `fighter_ranking` - and the desired end state was to have a single `fight_details` that contained not only detailed fighter stats for each fight, but also each fighter's attributes, and each fighter's ranking. Furthermore, `fight_details` was currently in the form of each row representing one fighter's stats for one fight, which meant there were two rows for each fight - one for each fighter's stats. So, four things needed to happen...



**1. Reshape `fight_details` to contain both fighter's statistics for a single fight in a single row.**
```{r, eval=FALSE, echo=TRUE}
#### Reshape fight_details_df to accomodate logistical regression or random forest ####
colnames(ufc_fight_details_df)[2:3] <- c("Fighter1", "Fighter2")


# Grab even number rows from fight_details to later bind with odd number rows.
fighter2_df <- ufc_fight_details_df %>%
  slice(seq(2,nrow(ufc_fight_details_df),2))

# Rename columns in fighter2_df and original fight_details_df.
colnames(fighter2_df)[17:160] <- paste("F2", colnames(fighter2_df)[17:160], sep = "")
colnames(ufc_fight_details_df)[17:160] <- paste("F1", colnames(ufc_fight_details_df)[17:160], sep = "")

# Delete even number rows from fight_details.
ufc_fight_details_df <- ufc_fight_details_df[-seq(2,nrow(ufc_fight_details_df),2),]

# Combine columns from fight_details and fighter_2, so each row holds each fighter's data for
# a given fight.
ufc_fight_details_df <- bind_cols(ufc_fight_details_df, fighter2_df[17:160])

# Delete fighter_2 data frame.
rm(fighter2_df)

# Then, REMOVE rows with NA for FIGHTER.
na_row_index <- c()
for (n in 1:nrow(ufc_fight_details_df)) {
  if (is.na(ufc_fight_details_df[n,2]) | is.na(ufc_fight_details_df[n,3])) {
    na_row_index <- c(na_row_index, n)
  }
}

ufc_fight_details_df <- ufc_fight_details_df[-c(na_row_index),]
###################################
```



**2. Include a new variable to indicate the winner - simply a 0 or 1 variable indicating if Fighter 1 wins or not.**

```{r, eval=FALSE, echo=TRUE}
# Mutate FIGHT_DETAILS to give logical form of result (Fighter1 or Fighter2) ready for logistic
# regression or random forest.
ufc_fight_details_df <- ufc_fight_details_df %>%
  mutate(F1Wins = as.character(Result) == as.character(Fighter1))

for (i in 1:nrow(ufc_fight_details_df)) {
  if (is.na(ufc_fight_details_df[i,305])) {
    ufc_fight_details_df[i,305] <- NA
  } else if (ufc_fight_details_df[i,305] == TRUE) {
    ufc_fight_details_df[i,305] <- 1
  } else {
    ufc_fight_details_df[i,305] <- 0
  }
}
```



**3. Merge `fightMetric_fighters` with `fight_details` on each fighter's name.**
```{r, eval=FALSE, echo=TRUE}
##### Merge FIGHTMETRIC_FIGHTERS_DF with FIGHT_DETAILS_DF ####

#### Merge FIGHTERS_DF to FIGHTER1 in FIGHT_DETAILS ####
fightMetric_fighters_df <- fightMetric_fighters_df %>%
  rename(Fighter1=Name)

ufc_fight_details_df <- ufc_fight_details_df %>%
  left_join(fightMetric_fighters_df, by = "Fighter1")

left_copy_fight_details <- ufc_fight_details_df[,1:306]
right_copy_fighter1_details <- ufc_fight_details_df[,307:323]

left_copy_double_rows <- as.numeric(rownames(left_copy_fight_details[duplicated(left_copy_fight_details),]))

left_copy_fight_details <- left_copy_fight_details[-c(left_copy_double_rows),]
right_copy_fighter1_details <- right_copy_fighter1_details[-c(left_copy_double_rows),]

ufc_fight_details_df <- left_copy_fight_details
########################

#### Merge FIGHTERS_DF to FIGHTER2 in FIGHT_DETAILS ####
fightMetric_fighters_df <- fightMetric_fighters_df %>%
  rename(Fighter2=Fighter1)

ufc_fight_details_df <- ufc_fight_details_df %>%
  left_join(fightMetric_fighters_df, by = "Fighter2")

left_copy_fight_details <- ufc_fight_details_df[,1:306]
right_copy_fighter2_details <- ufc_fight_details_df[,307:323]

left_copy_double_rows <- as.numeric(rownames(left_copy_fight_details[duplicated(left_copy_fight_details),]))

left_copy_fight_details <- left_copy_fight_details[-c(left_copy_double_rows),]
right_copy_fighter2_details <- right_copy_fighter2_details[-c(left_copy_double_rows),]

ufc_fight_details_df <- left_copy_fight_details %>%
  bind_cols(right_copy_fighter1_details) %>%
  bind_cols(right_copy_fighter2_details)

colnames(ufc_fight_details_df)[307:323] <- paste("F1", colnames(fightMetric_fighters_df)[2:18], sep = "")
colnames(ufc_fight_details_df)[324:340] <- paste("F2", colnames(fightMetric_fighters_df)[2:18], sep = "")

fightMetric_fighters_df <- fightMetric_fighters_df %>%
  rename(Name=Fighter2)
###################
```



**4. Merge `fighter_ranking` with `fight_details` on each fighter's name.**
```{r, eval=FALSE, echo=TRUE}
################### Assign FIGHTER_RANKING to fighters in NEW_FIGHTS #################

# Reset clean unique FightNumbers 
new_fights_df$FightNumber <- c(1:nrow(new_fights_df))

new_fights_df <- new_fights_df %>%
  mutate(F1Rank = 251, F2Rank = 251)

F1RankCol <- grep("F1Rank", colnames(new_fights_df))
F2RankCol <- grep("F2Rank", colnames(new_fights_df))

quarterly_date <- as.Date("2006-01-01", "%Y-%m-%d")
for (q in 1:(length(fighter_ranking_df)-9)) {
  temp_fighter_ranking <- fighter_ranking_df %>%
    select(c(q,353))
  
  quarterly_fight_details <- new_fights_df %>%
    filter(quarter(Date, with_year = TRUE) == quarter(quarterly_date, with_year = TRUE)) %>%
    select(FightNumber, Date, Fighter1, Fighter2, F1Rank, F2Rank)
  
  for (F1 in 1:nrow(quarterly_fight_details)) {
    if (tolower(quarterly_fight_details[F1, 3]) %in% tolower(na.omit(temp_fighter_ranking[,1]))) {
      quarterly_fight_details[F1,5] <- temp_fighter_ranking[tolower(na.omit(temp_fighter_ranking[,1])) == tolower(quarterly_fight_details[F1,3]), 2][1]
    }
    if (tolower(quarterly_fight_details[F1, 4]) %in% tolower(na.omit(temp_fighter_ranking[,1]))) {
      quarterly_fight_details[F1,6] <- temp_fighter_ranking[tolower(na.omit(temp_fighter_ranking[,1])) == tolower(quarterly_fight_details[F1,4]), 2][1]
    }
  }
  
  for (i in 1:nrow(quarterly_fight_details)) {
    new_fights_df[new_fights_df$FightNumber == quarterly_fight_details[i,1], c(F1RankCol, F2RankCol)] <- quarterly_fight_details[i, 5:6]
  }
  
  quarterly_date <- quarterly_date + months(3)
  if (quarter(quarterly_date, with_year = TRUE) > quarter(today(), with_year = TRUE)) {
    quarterly_date <- as.Date("2006-01-01", "%Y-%m-%d")
  }
}

new_fights_df$FightNumber <- c(1:nrow(new_fights_df))

new_fights_df$F1Rank <- as.numeric(new_fights_df$F1Rank)
new_fights_df$F2Rank <- as.numeric(new_fights_df$F2Rank)

#######################################################################################
```


***
### Feature Engineering
As mentioned earlier, the dataset for this project is a combination of data scraped from several sources, as well as data derived (or newly created) from the scraped data.  Specifically, the derived data includes career statistics for each fighter at the time of each fight.  This data was not available anywhere online (to my knowledge), but was critical in developing an accurate description of each fighter at the time of each fight in which to build a model that predicts future outcomes.  So, the basic approach to generating this new data was to sort through the entire fight dataset (`ufc_fight_details_df`) one fighter at a time in chronologically tally up their career stats. Below is an example of some of the code, which can almost certainly be refactored into something more economical, but for the time-being is made up of about 200 lines of nested `for` loops and `if` `else` statements.

**4. Example block of code to generate derivitive data.**
```{r, eval=FALSE, echo=TRUE}
# Add columns to be populated to FIGHT_DETAILS
ufc_fight_details_df <- ufc_fight_details_df %>%
  mutate(F1CarFightTime = NA, F2CarFightTime = NA, F1UFCWins = NA, F2UFCWins = NA, F1UFCLosses = NA,
         F2UFCLosses = NA, F1UFCNC = NA, F2UFCNC = NA, F1CarSigStr = NA, F2CarSigStr = NA,
         F1CarSigStrAtt = NA, F2CarSigStrAtt = NA, F1CarSigStrAbs = NA, F2CarSigStrAbs = NA,
         F1CarTD = NA, F2CarTD = NA, F1CarTDAtt = NA, F2CarTDAtt = NA, F1CarTDAbs = NA, F2CarTDAbs = NA,
         F1CarOppSigStrAtt = NA, F2CarOppSigStrAtt = NA, F1CarOppTDAtt = NA, F2CarOppTDAtt = NA)

#### Begin loop to generate additional fighter data ####
for (fighter in unique_fighters$.) {
  fighter_career_data <- ufc_fight_details_df %>%
    filter((Fighter1 == as.character(fighter)) | (Fighter2 == as.character(fighter))) %>%
    arrange(Date)
  
  for (i in 1:nrow(fighter_career_data)) {
    
  #### Adding up total career time in ring for each fighter going into each fight - Fighter1 spot. ####
    if (fighter == fighter_career_data[i,2]) {
      
      if (i == 1) {
        # F1CarFightTime when F1
        fighter_career_data[i, 342] <- 0
        
        # F1CarSigStr when F1
        fighter_career_data[i, 350] <- 0
        
        # F1CarSigStrAtt when F1
        fighter_career_data[i, 352] <- 0
        
        # F1CarSigStrAbs when F1
        fighter_career_data[i, 354] <- 0
        
        # F1CarTD when F1
        fighter_career_data[i, 356] <- 0
        
        # F1CarTDAtt when F1
        fighter_career_data[i, 358] <- 0
        
        # F1CarTDAbs when F1
        fighter_career_data[i, 360] <- 0
        
        # F1CarOppSigStrAtt when F1
        fighter_career_data[i, 362] <- 0
        
        # F1CarOppTDAtt when F1
        fighter_career_data[i, 364] <- 0
      } else {
        # F1 was F1 last fight
        if (fighter == fighter_career_data[(i-1),2]) {
         
          # F1CarFightTime when F1 this fight was F1 last fight
          fighter_career_data[i, 342] <- fighter_career_data[(i-1), 342] + fighter_career_data[(i-1), 341]
          
          # Get fight data from last fight when F1 this fight was F1 last fight.
          temp_fight_data <- c(fighter_career_data[(i-1), 18], fighter_career_data[(i-1), 19],
                               fighter_career_data[(i-1), 162], fighter_career_data[(i-1), 23],
                               fighter_career_data[(i-1), 24], fighter_career_data[(i-1), 167],
                               fighter_career_data[(i-1), 163], fighter_career_data[(i-1), 168])
          
          # F1CarSigStr through F1CarOppTDAtt when F1 this fight was F1 last fight
          for (n in 1:8) {
            if (is.na(temp_fight_data[n])) {
              fighter_career_data[i, ((174+n)*2)] <- 0
            } else {
              fighter_career_data[i, ((174+n)*2)] <- fighter_career_data[(i-1), ((174+n)*2)] + temp_fight_data[n]
            }
          }
        }
```


Some widdling down of the features occured in going from the 382 features of `ufc_fight_details_df` to the subset of 48 features ultimately used in the logisitic regression model.  This process was part logic, part domain expertise. Perhaps more on the logic side were features of physical description, like age, height, weight, reach, weight class, and stance. Career statistics features are the product of both logic and domain expertise.  You don't need to know anything about Mixed Martial Arts or UFC to think it's a good idea to consider each fighter's career statistics when making a prediction, but, for example, data for round-by-round statistics were not included because not all fights have the same number of rounds, and there would have been a considerable amount of additional data wrangling to present this data to the model in a useful way.  The decision to include each fighter's rank might also seem natural, but on the other hand an excellent data scientist without as much domain expertise might think that statistical descriptions of each fighter's performance should be enough, or even superior than a somewhat subjective or circumstantial indicator like rank. But the ranking variable takes into consideration factors such as: the quality of a fighter's past opponenents, and the chronology of past performance (e.g. a fighter with a long successful career that is now fighting past his prime).

***
<!-- ### Maintaining A Dynamic Dataset -->
<!-- After creating the inital foundational dataset and building the model, the work is not finished if it's intended to make predictions on a regular basis.  UFC holds events almost every week, so for this project to be practical the dataset must be updated after every event, which is just about every week. -->

<!-- *** -->
## Prediction Methods
After collecting and wrangling the data, we have a dataset of over 5,000 MMA fights ready to train a statistical model to make a predictions.  Given we're trying to predict a binary outcome - Win or Lose - for each fighter in a given fight, there are a few methods that would be suitable.  For this project I tested two different methods - Logistic Regression and Random Forest, and ended up going with Logistic Regression because it performed better on test data.

***
### Logistic Regression

***
#### Testing

```{r}
# Variable for the date ten years ago today.
ten_years_ago <- ymd(today()) - years(10)

# This gives us the subset of fights (rows) we'll use for the model.
row_subset_fights <- ufc_fight_details_df %>%
  filter(Date > ten_years_ago) %>%
  filter(Gender == "Men") %>%
  filter(WeightClass != "Strawweight") %>%
  filter(WeightClass != "Atomweight") %>%
  filter(WeightClass != "Catch Weight") %>%
  filter(WeightClass != "Open Weight") %>%
  filter(WeightClass != "Other") %>%
  filter(WeightClass != "Super Heavyweight") %>%
  filter((F1UFCWins>0 | F1UFCLosses>0 | F1UFCNC>0) & (F2UFCWins>0 | F2UFCLosses>0 | F2UFCNC>0)) %>%
  arrange(Date)


# Subset of columns to be used in the model subset.
model_dataset_columns <- read.csv(paste(getwd(), "/FreshModelColumns.csv", sep = ""))
model_dataset_columns <- as.character(model_dataset_columns$ColumnNames)

# Get subset fight_details dataset - all rows; only columns from selected_features
fresh_model_dataset <- row_subset_fights %>%
  subset(select = model_dataset_columns)

# Factor Weightclass.
fresh_model_dataset$WeightClass <- as.factor(fresh_model_dataset$WeightClass)

# Factor F1Stance
fresh_model_dataset$F1Stance <- fresh_model_dataset$F1Stance %>%
  as.factor()

# Factor F2Stance
fresh_model_dataset$F2Stance <- fresh_model_dataset$F2Stance %>%
  as.factor()

##########!!!!!!!!! SUBSETTING !!!!!!!!!!###########
# Split fight_details_df into training and testing sets.

set.seed(109) 

split_ufc_data <- fresh_model_dataset$F1Wins %>%
  sample.split(SplitRatio = 0.70)

train_ufc_subset <- fresh_model_dataset %>%
  subset(split_ufc_data == TRUE)

test_ufc_subset <- fresh_model_dataset %>%
  subset(split_ufc_data == FALSE)


# Remove and save columns we don't need, but may need later... just in case.
saved_column_names <- c("FightNumber","Fighter1","Fighter2","Date")

column_pos <- c()
for (column in saved_column_names) {
  column_pos <- c(column_pos, grep(column, colnames(fresh_model_dataset)))
}

saved_columns <- fresh_model_dataset %>%
  select(column_pos)

train_ufc_subset <- train_ufc_subset %>% select(-column_pos)
test_ufc_subset <- test_ufc_subset %>% select(-column_pos)


####################################
######### LOGIT MODEL ##############

ufc_fresh_logit_model <- glm(F1Wins ~ ., data = na.omit(train_ufc_subset), family = binomial)

# Prediction - probability that F1 wins
test_ufc_subset$F1WinsProb <- round(predict(ufc_fresh_logit_model, newdata = test_ufc_subset, type = "response"), 4)

#########################################
########## THRESHOLD MODELS #############
WinThreshold <- 0.5

# Filter out fights to get only those where F1WinsProb > 0.6 or < 0.4
test_threshold_subset <- test_ufc_subset %>%
  filter(F1WinsProb > WinThreshold | F1WinsProb < (1-WinThreshold))

test_threshold_subset$PredF1Wins <- as.numeric(test_threshold_subset$F1WinsProb > WinThreshold)

#######################################
######### LOGIT CONFUSION MATRIX #######

logitPerformance <- confusionMatrix(as.factor(test_threshold_subset$PredF1Wins), as.factor(test_threshold_subset$F1Wins),
                                    dnn = c("PredF1Wins", "F1Wins"))

logitPerformance
```

```{r, echo=TRUE}
roc(F1Wins ~ F1WinsProb, test_threshold_subset, auc = TRUE, plot = TRUE)
```

***
### Random Forest

***
#### Testing

```{r}
############################ RANDOM FOREST MODEL ###################################
#set.seed(388)

ufc_fresh_randomForest_model <- randomForest(as.factor(F1Wins) ~ ., data = na.omit(train_ufc_subset),
                                            importance=TRUE)

test_ufc_subset$F1WinsProb <- NULL

test_ufc_subset$RFF1WinsProb <- predict(ufc_fresh_randomForest_model, test_ufc_subset, type = "prob")[,2]

test_ufc_subset$RFPredF1Wins <- as.numeric(test_ufc_subset$RFF1WinsProb > 0.5)

randomForestPerformance <- confusionMatrix(as.factor(test_ufc_subset$RFPredF1Wins), as.factor(test_ufc_subset$F1Wins), dnn = c("F1Wins", "PredF1Wins"))

randomForestPerformance
```

```{r, echo=TRUE}
roc(F1Wins ~ RFF1WinsProb, test_ufc_subset, auc = TRUE, plot = TRUE)
```

```{r}
varImpPlot(ufc_fresh_randomForest_model, n.var = 10)
```


***
## UFC Project Workflow

The practical application of this project is to generate predictions for UFC fights on a weekly basis in order to wager on the fights.  There are four main steps in this workflow.  

1. Update `FightMetric_fighters_df`:  After each fight event, existing fighter data changes and sometimes new fighters are introduced to UFC.  So, we want to pull this new information into our dataset of fighters (`FightMetric_fighters_df`).  Specifically, we do this by running two blocks of code labeled for FightMetric_fighters_df that can be found in the UFC.R script beginning at line ~250 and ~1022, respectively.  

2. Update `fighter_ranking_df`:  Of course, after each event fighter's win and lose, and so the fighter rankings change.  There are actually two compenents to our `fighter_ranking_df` - we have historical rankings and current rankings.  After each event we need to update the current rankings of each fighter. But the historical rankings will not change.  These are recorded in the dataset on a quarterly basis.  In other words, we have historical rankings for each weightclass for each quarter going back to Q1 of 2006. So updating of the historical rankings only occurs once a quarter when they are released.  Theoretically this means our model could have a few cases of rankings that are slightly off, but as a practical matter quarterly rankings are probably pretty accurate as the vast majority of fighters do not fight more than once per quarter.  We update the current rankings by running the block of code beginning at line ~1079 of UFCUpdater.R, and we update the historical rankings once a quarter by running the block of code at line ~1030 of UFCUpdater.R.  

3. Update `ufc_fight_details_df`:  After each event there are new fights and detailed statistics to be added to our `ufc_fight_details_df`.  While this dataset was originally built using the UFC.R script, it can be a lengthy and somewhat hands-on process to scrape nearly 10,000 web pages once, but to do so every week would be overly burdonsome.  So, there is another script, UFCUpdater.R, that scrapes only the recently completed fights and data, computes their derivitive data, and then merges them back into `ufc_fight_details_df`.  We simply run the UFCUpdator.R script to accomplish this.  

4. Generate predictions:  Once we have updated our dataset, we are ready to generate predictions for the set of upcoming fights.  To accomplish this we run the UFCPredictor.R script.  This script does a few things.  First, it scrapes the upcoming fight matchups.  Then it computes and gathers the relevant variables for each fighter from our dataset.  Then it refits the model with the most recent dataset.  Finally, it applies our model to the set of data for the upcoming fights to produce a probability that Fighter 1 will win.

***
## Real World Results

```{r}
past_predictions <- read.csv("/Users/lw2134/Desktop/DataSci/UFCSite/PastPredictions.csv",
                                 header = TRUE, stringsAsFactors = FALSE)

past_predictions$X <- NULL

past_predictions$Date <- past_predictions$Date %>%
  as.Date("%Y-%m-%d")

past_predictions <- past_predictions %>%
  arrange(desc(Date))

past_predictions$`Vegas Outcome` <- NA
for(i in 1:nrow(past_predictions)) {
  if ((past_predictions$Moneyline[i] > 0) & (past_predictions$Outcome[i]=="WIN")) {
    past_predictions$`Vegas Outcome`[i] <- "LOSE"
  } else if ((past_predictions$Moneyline[i] < 0) & (past_predictions$Outcome[i]=="LOSE")) {
    past_predictions$`Vegas Outcome`[i] <- "LOSE"
  } else {
    past_predictions$`Vegas Outcome`[i] <- "WIN"
  }
}

colnames(past_predictions)[c(2,3,8)] <- c("Fighter 1", "Fighter 2", "Model Outcome")

kable(past_predictions, align = 'l')
```

```{r}
past_predictions_plot <- past_predictions$Date %>%
  unique() %>%
  as.data.frame()

past_predictions_wins <- past_predictions %>%
  arrange(desc(Date)) %>%
  group_by(Date) %>%
  filter(`Model Outcome` =="WIN") %>%
  count(`Model Outcome`)

past_predictions_plot$ModelWins <- past_predictions_wins$n

rm(past_predictions_wins)

past_predictions_vegaswins <- past_predictions %>%
  arrange(desc(Date)) %>%
  group_by(Date) %>%
  filter(`Vegas Outcome` =="WIN") %>%
  count(`Vegas Outcome`)

past_predictions_plot$VegasWins <- past_predictions_vegaswins$n

rm(past_predictions_vegaswins)

past_predictions_plot$. <- rev(past_predictions_plot$.)

colnames(past_predictions_plot)[1] <- "Date"

ggplot(past_predictions_plot, aes(Date)) + geom_line(aes(y = ModelWins, colour = "ModelWins")) +
  geom_line(aes(y = VegasWins, colour = "VegasWins")) + ylab("Win Count") + scale_x_date()
```

```{r}
past_predictions_totals <- data.frame(1, 1:(sum(past_predictions_plot$ModelWins) + sum(past_predictions_plot$VegasWins)))

past_predictions_totals$Wins <- c(rep("Model", sum(past_predictions_plot$ModelWins)), rep("Vegas", sum(past_predictions_plot$VegasWins)))


qplot(Wins, data = past_predictions_totals, geom = "bar", fill = Wins)
```

***
## Roadmap For Improvement 

The status of this project is both functional and practical, and it accomplishes what it sets out to do. Nonetheless, there is room for improvement.  The following is a working list of items to explore for improving the performance of the model.  
  
**Feature Engineering**  
1. A feature that incorporates how a fight ends: For example, does a fighter win by KO/TKO, Submission, Decision, etc...  
2. A feature that incorporates whether or not this is a fighter's first fight in UFC.  
3. A feature to incorporate a fighter's current winning or losing streak.  
4. Further testing of different variable combinations (removing variables that appear insignificant, for example)

**Predictive Modeling**  
1. Explore implementing and Ensemble model of both Logistic Regression and Random Forest.  
2. Explore clustering on sets of fights predicted incorrectly to see if we can develop rules to improve overall predictive accuracy with a multi-stage predictive approach... or a multi-faceted model (i.e. generate multiple fits for subsets of fights that meet certain criteria).  
3. Explore Specificity, Precision, and other model performance measures as a way to weight or hone model predictions.  

**Usability**  
1. Automate dataset updates and prediction generation.  
2. Build a user-friendly dashboard or interactive interface with Flexdashboard or Shiny, respectively.  

**Code Base**  
1. Implement testing suite to help avoid "breaking" an increasingly complex code base during future code maintenance and feature additions.  
2. Plenty of room for refactoring, which will simplfy readability and collaboration, and may present some gains in execution speed.

***
## Appendix  

### Data
```{r}
# Creating table to explain the data.
data_table <- data.frame(colnames(ufc_fight_details_df),
                         rep(NA,length(ufc_fight_details_df)),
                         as.character(ufc_fight_details_df[3449,]),
                         rep("FightMetric",length(ufc_fight_details_df)),
                         stringsAsFactors = FALSE)

# Assign column names.
colnames(data_table) <- c("Variables", "Unabreviated", "Example", "Source")

# Generate unabreviated variable names.
data_table$Unabreviated <- data_table$Variables %>%
  gsub("F1", "Fighter1 ", .) %>%
  gsub("F2", "Fighter2 ", .) %>%
  gsub("Str", "Strike ", .) %>%
  gsub("Att", "Attempt ", .) %>%
  gsub("Car", "Career ", .) %>%
  gsub("Sig", "Significant ", .) %>%
  gsub("Opp", "Opponent ", .) %>%
  gsub("TD", "Take Down ", .) %>%
  gsub("Abs", "Absorbed ", .) %>%
  gsub("pM", "per Minute ", .) %>%
  gsub("Acc", "Accuracy ", .) %>%
  gsub("Def", "Defense ", .) %>%
  gsub("Avg", "Average ", .) %>%
  gsub("Dist", "Distance ", .) %>%
  gsub("KD", "Knock Down ", .) %>%
  gsub("Pass", "Guard Pass ", .) %>%
  gsub("NC", "No Contest ", .) %>%
  gsub("R1", "Round 1 ", .) %>%
  gsub("R2", "Round 2 ", .) %>%
  gsub("R3", "Round 3 ", .) %>%
  gsub("R4", "Round 4 ", .) %>%
  gsub("R5", "Round 5 ", .) %>%
  gsub("Tot", "Total ", .) %>%
  gsub("Rev", "Reversal", .) %>%
  gsub("SA", "Strikes Absorbed ", .) %>%
  gsub("Sub", "Submission ", .) %>%
  gsub("Ref", "Referee ", .) %>%
  gsub("SL", "Strikes Landed ", .)

# Insert data sources
self_derived <- grep(".*Car.*", data_table$Variables)
fightmatrix <- grep("F1Rank|F2Rank", data_table$Variables)

data_table$Source[self_derived] <- "Derived"
data_table$Source[fightmatrix] <- "FightMatrix"

kable(data_table, caption = "Composition of the primary datset, [ufc_fight_details_df]", align = 'l')
```

```{r}
# Creating table to explain the data.
data_table_fighters <- data.frame(colnames(fightMetric_fighters_df),
                         rep(NA,length(fightMetric_fighters_df)),
                         as.character(fightMetric_fighters_df[fightMetric_fighters_df$Name=="Gegard Mousasi",]),
                         rep("FightMetric",length(fightMetric_fighters_df)),
                         stringsAsFactors = FALSE)

# Assign column names.
colnames(data_table_fighters) <- c("Variables", "Unabreviated", "Example", "Source")

# Generate unabreviated variable names.
data_table_fighters$Unabreviated <- data_table_fighters$Variables %>%
  gsub("Str", "Strike ", .) %>%
  gsub("SL", "Strikes Landed ", .) %>%
  gsub("pM", "per Minute ", .) %>%
  gsub("Acc", "Accuracy ", .) %>%
  gsub("Def", "Defense ", .) %>%
  gsub("Avg", "Average ", .) %>%
  gsub("SA", "Strikes Absorbed", .) %>%
  gsub("Sub", "Submission ", .) %>%
  gsub("Att", "Attempt ", .) %>%
  gsub("TD", "Take Down ", .)

kable(data_table_fighters, caption = "Composition of the primary datset, [ufc_fight_details_df]", align = 'l')
```





