¶ … Diligence
Alphanumeric
HouseholdID
Length of Residence
City
Gender
Occupation
Fields with null values
MSACode
Athletic Dimension
Fields with "Unknown" values
Marital Status
Occupation
Presence of Children
Wealth Score
MailResponder
f. Number-based fields that cannot be used in calculations
Income
LengthofResidence
Region
g. Number-based fields that can be used in calculations
Fields where numbers used to indicate non-text values (e.g. Y/N, etc.)
OwnACat
OwnADog
OwnCellularPhone
Due Diligence
Standardize Y/N fields
Methodology should be the same in all columns
Calculate averages and standard deviations for all number fields after cleanup is completed
Can at least do counts for those fields after cleanup
c. Ensure no duplicate values in HouseholdID
Appears that this should be unique for each record
d. Ensure no misspellings of city names so that there is not more than one field for each city name
i. Example would be that Huntington Beach and Huntington Bch should not be treated as unique values (did not see that ... just an example).
e. Ensure no invalid state abbreviations are used
i. E.g. XY used for Wyoming instead of WY (again, just an example)
f. Figure out something to do with "0" as age value (invalid figure)
i. Needs to be changed to "Unknown" or made null out if not known. The latter will probably be necessary to do calculations that are valid.
g. Figure out something to do with "U" (Unknown) in gender field
You’re 76% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.