Data Clean-up Dataset to be cleaned up can be downloaded via google drive https:
ID: 3850445 • Letter: D
Question
Data Clean-up
Dataset to be cleaned up can be downloaded via google drive
https://drive.google.com/file/d/0B1D66qK8jxd0YTI3cUU4R1VDa0U/view?usp=sharing
1. Create a separate repository and push the attached dataset (dirty_data.csv)
2. Populate the missing values in the Area variable with an appropriate values (Birmingham, Coventry, Dudley, Sandwell, Solihull, Walsall or Wolverhampton)
3. Remove special characters, padding (the white space before and after the text) from Street 1 and Street 2 variables. Make sure the first letters of street names are capitalized and the street denominations are following the same standard (for example, all streets are indicated as “str.”, avenues as “ave.”, etc.
4. If the value in Street 2 duplicates the value in Street 1, remove the value in Street 2
5. Remove the “Strange HTML column”
Complete the cleanup code and push the changes to the repository.
Submit a link to the repository. The repository will contain:
Combined code (.r or .rmd)
Original (dirty) dataset
New (clean) dataset
Dataset can be found
https://drive.google.com/file/d/0B1D66qK8jxd0YTI3cUU4R1VDa0U/view?usp=sharing
Explanation / Answer
main.R
Related Questions
Navigate
Integrity-first tutoring: explanations and feedback only — we do not complete graded work. Learn more.