Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

we will explore Sean Lahman’s historical baseball database, which contains compl

ID: 2928466 • Letter: W

Question

we will explore Sean Lahman’s historical baseball database, which contains complete seasonal records for all players on all Major League Baseball teams going back to 1871. These data are made available in R via the Lahman package. While domain knowledge may be helpful, it is not necessary to follow the example.

Sean Lahman’s Baseball Database is not just one dataset. Type help("Lahman-package") to get an idea of the data tables available. The batting statistics of players are stored in one table (Batting), while information about people (most of whom are players) is in a different table (Master).

Confirm that Barry Bonds has the record for most home runs (762) hit in a career. For this, list top 20 players’ names with the most home runs, and confirm that Manny is in the top 20. Note that you will need to join the Batting and Master tables together to display the players’ name instead of the player ID.

Name every pitcher in baseball history who has accumulated at least 300 wins (W) and at least 3,000 strikeouts (SO). Use Pitching table.

Display a table with 10 most recent World Series MVP awardees. Include their names and ages. The following code chunk is a good start.

Explanation / Answer

Here are the answers below

install.packages("Lahman")
library(Lahman)
library(dplyr)

# Load the data

data("Master")
data("Batting")
data("Pitching")
data("AwardsPlayers")

# Left join Batting and Master by player ID
batting_left <- left_join(Batting, Master, by = "playerID")

# Find player ID for "Barry Bonds"
Master[(Master$nameFirst == "Barry" & Master$nameLast == "Bonds"),]

#Confirming Barry Bonds Home Run hit of 762
sum(batting_left[batting_left$playerID == "bondsba01", "HR"])

[1] 762

#sorting the data table as per most home runs in descending order
HR_summary <- batting_left %>% group_by(playerID, nameFirst) %>%
summarise(HR_total = sum(HR)) %>% arrange(desc(HR_total))

#Listing TOP 20
head(HR_summary, 20)

A tibble: 20 x 3
# Groups: playerID [20]
playerID nameFirst HR_total
<chr> <chr> <int>
1 bondsba01 Barry 762
2 aaronha01 Hank 755
3 ruthba01 Babe 714
4 rodrial01 Alex 696
5 mayswi01 Willie 660
6 griffke02 Ken 630
7 thomeji01 Jim 612
8 sosasa01 Sammy 609
9 pujolal01 Albert 591
10 robinfr02 Frank 586
11 mcgwima01 Mark 583
12 killeha01 Harmon 573
13 palmera01 Rafael 569
14 jacksre01 Reggie 563
15 ramirma02 Manny 555
16 schmimi01 Mike 548
17 ortizda01 David 541
18 mantlmi01 Mickey 536
19 foxxji01 Jimmie 534
20 mccovwi01 Willie 521

# Yes Manny is in top 20, at 15th position with 555 home run score

# Left join Pitching and Master by player ID
pitching_left <- left_join(Pitching, Master, by = "playerID")

#Summarising for Win Total and Strikeout total
Pitch_summary <- pitching_left %>% group_by(playerID, nameFirst) %>%
summarise(W_total = sum(W), SO_total = sum(SO)) %>% arrange(desc(W_total), desc(SO_total))

# Every Pitcher in the baseball history who has accumulated at least 300 wins (W) and
#at least 3,000 strikeouts (SO)
Pitch_summary %>% filter(W_total >= 300 & SO_total >= 3000)

# A tibble: 10 x 4
# Groups: playerID [10]
playerID nameFirst W_total SO_total
<chr> <chr> <int> <int>
1 johnswa01 Walter 417 3509
2 maddugr01 Greg 355 3371
3 clemero02 Roger 354 4672
4 carltst01 Steve 329 4136
5 ryanno01 Nolan 324 5714
6 suttodo01 Don 324 3574
7 niekrph01 Phil 318 3342
8 perryga01 Gaylord 314 3534
9 seaveto01 Tom 311 3640
10 johnsra05 Randy 303 4875

# Table with 10 most recent World Series MVP awardees. Include their names and ages
AwardsPlayers %>% filter(awardID == "World Series MVP") %>% arrange(desc(yearID)) %>% head(10) %>% left_join(Master, by = "playerID") %>% mutate(Age = Sys.Date()-birthDate) %>% select(playerID, nameFirst, nameLast, Age, awardID, yearID)

playerID nameFirst nameLast Age awardID yearID
1 zobribe01 Ben Zobrist 13293 days World Series MVP 2016
2 perezsa02 Salvador Perez 10022 days World Series MVP 2015
3 bumgama01 Madison Bumgarner 10304 days World Series MVP 2014
4 ortizda01 David Ortiz 15309 days World Series MVP 2013
5 sandopa01 Pablo Sandoval 11390 days World Series MVP 2012
6 freesda01 David Freese 12591 days World Series MVP 2011
7 renteed01 Edgar Renteria 15046 days World Series MVP 2010
8 matsuhi01 Hideki Matsui 15833 days World Series MVP 2009
9 hamelco01 Cole Hamels 12348 days World Series MVP 2008
10 lowelmi01 Mike Lowell 15941 days World Series MVP 2007