Title: | Synthetic Microdata and Spatial MicroSimulation Modeling for ACS Data |
---|---|
Description: | Provides access to curated American Community Survey (ACS) base tables via a wrapper to library(acs). Builds synthetic micro-datasets at any user-specified geographic level with ten default attributes; and, conducts spatial microsimulation modeling (SMSM) via simulated annealing. SMSM is conducted in parallel by default. Lastly, we provide functionality for data-extensibility of micro-datasets <doi:10.18637/jss.v104.i07>. |
Authors: | Alex Whitworth [aut, cre] |
Maintainer: | Alex Whitworth <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.7.1.1 |
Built: | 2024-11-05 03:41:59 UTC |
Source: | https://github.com/alexwhitworth/synthacs |
Add a new constraint to the mapping between a given macro dataset (class "macroACS") and a matching micro dataset (class "micro_synthetic). May be called repeatedly to create a set of constraints.
add_constraint( attr_name = "variable", attr_totals, micro_data, constraint_list = NULL )
add_constraint( attr_name = "variable", attr_totals, micro_data, constraint_list = NULL )
attr_name |
The name of the attribute, or variable, that you wish to constrain. |
attr_totals |
A named integer vector of counts per level of the new constraining attribute. |
micro_data |
The micro dataset, of class |
constraint_list |
A |
A list of constraints.
## Not run: ## assumes that you have a micro_synthetic dataset named test_micro and attribute counts ## named a,e,g respectively c_list <- add_constraint(attr_name= "age", attr_totals= a, micro_data= test_micro) c_list <- add_constraint(attr_name= "edu_attain", attr_totals= e, micro_data= test_micro, constraint_list= c_list) c_list <- add_constraint(attr_name= "gender", attr_totals= g, micro_data= test_micro, constraint_list= c_list) ## End(Not run)
## Not run: ## assumes that you have a micro_synthetic dataset named test_micro and attribute counts ## named a,e,g respectively c_list <- add_constraint(attr_name= "age", attr_totals= a, micro_data= test_micro) c_list <- add_constraint(attr_name= "edu_attain", attr_totals= e, micro_data= test_micro, constraint_list= c_list) c_list <- add_constraint(attr_name= "gender", attr_totals= g, micro_data= test_micro, constraint_list= c_list) ## End(Not run)
A dataset containing age-adjusted death rate data by race and gender of the deceased. Data is provided for 1980-2013.
adjDR
adjDR
A data.frame
with 612 observations and 4 variables.
The year for which data was was recorded.
The racial group of the deceased One of all
all races; white
whites; black_aa
black / African-American; nat_amer
American Indian
or Native Alaskan; asian_isl
Asian or Pacific Islander; hisp_lat
Hispanic.
The gender of the deceased. One of c(both, male, female)
The age-adjusted death rate. See details.
The age-adjusted death rates are used to compare relative mortality risks among groups and over time. They were computed by the direct method, which is defined
where is the standard population for age group i,
is the total US standard population and
is the raw death rate for age group i.
Populations are based on census counts enumerated as of April 1 of the census year and estimated as of July 1 for non-census years.
https://www.cdc.gov/nchs/nvss/deaths.htm
Xu, J. Q., S. L. Murphy, and K. D. Kochanek. "Deaths: final data for 2013." National Vital Statistics Reports 64.2 (2015).
A dataset containing death rates for individuals by age group and race for the United States, 2013.
AgeRaceDR
AgeRaceDR
A data.frame
with 360 observations and 4 variables.
The exact age, in years, at which life expectany is calculated.
The racial group of the deceased One of all
all races; white
whites; black
black / African-American; hispanic
Hispanic; asian.isl
Asian and Pacific Islander; nat.amer
Native American or Alaska Native.
The gender of the deceased. One of c(both, male, female)
The raw death rate. See details.
The death rate is defined as deaths per 100,000 population.
https://www.cdc.gov/nchs/nvss/deaths.htm
Xu, J. Q., S. L. Murphy, and K. D. Kochanek. "Deaths: final data for 2013." National Vital Statistics Reports 64.2 (2015).
Create a new age constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_age(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_age(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment a1 <- all_geog_constraint_age(obj, "synthetic") a2 <- all_geog_constraint_age(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment a1 <- all_geog_constraint_age(obj, "synthetic") a2 <- all_geog_constraint_age(obj, "macro_table") ## End(Not run)
Create a new educational attainment constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_edu(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_edu(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment e1 <- all_geog_constraint_edu(obj, "synthetic") e2 <- all_geog_constraint_edu(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment e1 <- all_geog_constraint_edu(obj, "synthetic") e2 <- all_geog_constraint_edu(obj, "macro_table") ## End(Not run)
Create a new employment status constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_employment(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_employment(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment e1 <- all_geog_constraint_employment(obj, "synthetic") e2 <- all_geog_constraint_employment(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment e1 <- all_geog_constraint_employment(obj, "synthetic") e2 <- all_geog_constraint_employment(obj, "macro_table") ## End(Not run)
Create a new gender constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_gender(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_gender(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment g1 <- all_geog_constraint_gender(obj, "synthetic") g2 <- all_geog_constraint_gender(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment g1 <- all_geog_constraint_gender(obj, "synthetic") g2 <- all_geog_constraint_gender(obj, "macro_table") ## End(Not run)
Create a new geographic mobility constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_geog_mob(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_geog_mob(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment gm1 <- all_geog_constraint_geog_mob(obj, "synthetic") gm2 <- all_geog_constraint_geog_mob(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment gm1 <- all_geog_constraint_geog_mob(obj, "synthetic") gm2 <- all_geog_constraint_geog_mob(obj, "macro_table") ## End(Not run)
Create a new individual income constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_income(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_income(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment i1 <- all_geog_constraint_income(obj, "synthetic") i2 <- all_geog_constraint_income(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment i1 <- all_geog_constraint_income(obj, "synthetic") i2 <- all_geog_constraint_income(obj, "macro_table") ## End(Not run)
Create a new marital status constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_marital_status(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_marital_status(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment m1 <- all_geog_constraint_marital_status(obj, "synthetic") m2 <- all_geog_constraint_marital_status(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment m1 <- all_geog_constraint_marital_status(obj, "synthetic") m2 <- all_geog_constraint_marital_status(obj, "macro_table") ## End(Not run)
Create a new nativity status constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_nativity(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_nativity(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment n1 <- all_geog_constraint_nativity(obj, "synthetic") n2 <- all_geog_constraint_nativity(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment n1 <- all_geog_constraint_nativity(obj, "synthetic") n2 <- all_geog_constraint_nativity(obj, "macro_table") ## End(Not run)
Create a new poverty status constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_poverty(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_poverty(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment p1 <- all_geog_constraint_poverty(obj, "synthetic") p2 <- all_geog_constraint_poverty(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment p1 <- all_geog_constraint_poverty(obj, "synthetic") p2 <- all_geog_constraint_poverty(obj, "macro_table") ## End(Not run)
Create a new race constraint list to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'synthACS').
all_geog_constraint_race(obj, method = c("synthetic", "macro.table"))
all_geog_constraint_race(obj, method = c("synthetic", "macro.table"))
obj |
An object of class |
method |
One of |
## Not run: # assumes that obj of class 'synthACS' already exists in your environment r1 <- all_geog_constraint_race(obj, "synthetic") r2 <- all_geog_constraint_race(obj, "macro_table") ## End(Not run)
## Not run: # assumes that obj of class 'synthACS' already exists in your environment r1 <- all_geog_constraint_race(obj, "synthetic") r2 <- all_geog_constraint_race(obj, "macro_table") ## End(Not run)
Optimize the candidate micro datasets such that the lowest loss against the
macro dataset constraints are obtained. Loss is defined here as total absolute error (TAE)
and constraints are defined by the constraint_list_list
. Optimization is done by
simulated annealing and geographies are run in parallel.
all_geog_optimize_microdata( macro_micro, prob_name = "p", constraint_list_list, p_accept = 0.4, max_iter = 10000L, seed = sample.int(10000L, size = 1, replace = FALSE), leave_cores = 1L, verbose = TRUE )
all_geog_optimize_microdata( macro_micro, prob_name = "p", constraint_list_list, p_accept = 0.4, max_iter = 10000L, seed = sample.int(10000L, size = 1, replace = FALSE), leave_cores = 1L, verbose = TRUE )
macro_micro |
The geographical dataset of macro and micro data. Should be of class
|
prob_name |
It is assumed that observations are weighted and do not have an equal probability of occurance. This string specifies the variable within each dataset that contains the probability of selection. |
constraint_list_list |
A list of constraint lists. See |
p_accept |
The acceptance probability for the Metropolis acceptance criteria. |
max_iter |
The maximum number of allowable iterations. Defaults to |
seed |
A seed for reproducibility. See |
leave_cores |
An |
verbose |
Logical. Do you wish to see verbose output? Defaults to |
## Not run: # assumes that micro_synthetic and cll already exist in your environment # see: examples for derive_synth_datasets() and all_geogs_add_constraint() optimized_la <- all_geog_optimize_microdata(micro_synthetic, prob_name= "p", constraint_list_list= cll, p_accept= 0.01, max_iter= 1000L) ## End(Not run)
## Not run: # assumes that micro_synthetic and cll already exist in your environment # see: examples for derive_synth_datasets() and all_geogs_add_constraint() optimized_la <- all_geog_optimize_microdata(micro_synthetic, prob_name= "p", constraint_list_list= cll, p_accept= 0.01, max_iter= 1000L) ## End(Not run)
Add a new attribute to a set (ie list) of synthetic_micro datasets using conditional relationships between the new attribute and existing attributes (eg. wage rate conditioned on age and education level). The same attribute is added to *each* synthetic_micro dataset, where each dataset is supplied a distinct relationship for attribute creation.
all_geog_synthetic_new_attribute( df_list, prob_name = "p", attr_name = "variable", conditional_vars = NULL, st_list = NULL, leave_cores = 1L )
all_geog_synthetic_new_attribute( df_list, prob_name = "p", attr_name = "variable", conditional_vars = NULL, st_list = NULL, leave_cores = 1L )
df_list |
A |
prob_name |
A string specifying the column name of each |
attr_name |
A string specifying the desired name of the new attribute to be added to the data. |
conditional_vars |
An character vector specifying the existing variables, if any, on which
the new attribute (variable) is to be conditioned on for each dataset. Variables must be specified
in order. Defaults to |
st_list |
A |
leave_cores |
An |
A list of new synthetic_micro datasets each with class "synthetic_micro".
## Not run: set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), age= factor(sample(1:5, size= 100, replace= TRUE)), pov= factor(sample(c("lt_pov", "gt_eq_pov"), size= 100, replace= TRUE, prob= c(.15,.85))), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") # and example test elements cond_v <- c("gender", "pov") levels <- c("employed", "unemp", "not_in_LF") sym_tbl <- data.frame(gender= rep(rep(c("male", "female"), each= 3), 2), pov= rep(c("lt_pov", "gt_eq_pov"), each= 6), cnts= c(52, 8, 268, 72, 12, 228, 1338, 93, 297, 921, 105, 554), lvls= rep(levels, 4)) df_list <- replicate(10, df, simplify= FALSE) st_list <- replicate(10, sym_tbl, simplify= FALSE) # run library(parallel) syn <- all_geog_synthetic_new_attribute(df_list, prob_name= "p", attr_name= "variable", conditional_vars= cond_v,st_list= st_list) ## End(Not run)
## Not run: set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), age= factor(sample(1:5, size= 100, replace= TRUE)), pov= factor(sample(c("lt_pov", "gt_eq_pov"), size= 100, replace= TRUE, prob= c(.15,.85))), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") # and example test elements cond_v <- c("gender", "pov") levels <- c("employed", "unemp", "not_in_LF") sym_tbl <- data.frame(gender= rep(rep(c("male", "female"), each= 3), 2), pov= rep(c("lt_pov", "gt_eq_pov"), each= 6), cnts= c(52, 8, 268, 72, 12, 228, 1338, 93, 297, 921, 105, 554), lvls= rep(levels, 4)) df_list <- replicate(10, df, simplify= FALSE) st_list <- replicate(10, sym_tbl, simplify= FALSE) # run library(parallel) syn <- all_geog_synthetic_new_attribute(df_list, prob_name= "p", attr_name= "variable", conditional_vars= cond_v,st_list= st_list) ## End(Not run)
Add a new constraint to the mapping between a a set of macro datasets and a matching set of micro dataset (supplied as class 'macro_micro'). May be called repeatedly to create a set of constraints across the sub-geographies.
all_geogs_add_constraint( attr_name = "variable", attr_total_list, macro_micro, constraint_list_list = NULL )
all_geogs_add_constraint( attr_name = "variable", attr_total_list, macro_micro, constraint_list_list = NULL )
attr_name |
The name of the attribute, or variable, that you wish to constrain. |
attr_total_list |
A list of named integer vectors containing counts per level of the new constraining attribute for each geography. |
macro_micro |
The geographical dataset of macro and micro data. Should be of class
|
constraint_list_list |
A |
A list of constraint lists.
## Not run: # assumes that micro_synthetic already exists in your environment # 1. build constraints for gender and age g <- all_geog_constraint_gender(micro_synthetic, method= "macro.table") a <- all_geog_constraint_age(micro_synthetic, method= "macro.table") # 2. bind constraints to geographies and macro-data cll <- all_geogs_add_constraint(attr_name= "age", attr_total_list= a, macro_micro= micro_synthetic) cll <- all_geogs_add_constraint(attr_name= "gender", attr_total_list= g, macro_micro= micro_synthetic, constraint_list_list= cll) ## End(Not run)
## Not run: # assumes that micro_synthetic already exists in your environment # 1. build constraints for gender and age g <- all_geog_constraint_gender(micro_synthetic, method= "macro.table") a <- all_geog_constraint_age(micro_synthetic, method= "macro.table") # 2. bind constraints to geographies and macro-data cll <- all_geogs_add_constraint(attr_name= "age", attr_total_list= a, macro_micro= micro_synthetic) cll <- all_geogs_add_constraint(attr_name= "gender", attr_total_list= g, macro_micro= micro_synthetic, constraint_list_list= cll) ## End(Not run)
A dataset containing birth rate data in the United States by age and race of the mother. Data for all races is provided for 1970-2014 and for individual races from 1989-2014.
BR2014
BR2014
A data.frame
with 1,750 observations and 4 variables.
The year for which data was was recorded.
The racial group of the mothers. One of all
all races; white
non-hispanic whites; black_aa
black / African-American; nat_amer
American Indian
or Native Alaskan; asian_isl
Asian or Pacific Islander; hisp_lat
Hispanic or Latin
American.
The age group of the mother.
The birth rate. See Details.
The birth rate is defined as births per 1,000 women in the specified group (age and race).
Populations are based on census counts enumerated as of April 1 of the census year and estimated as of July 1 for non-census years.
Beginning in 1997, birth rates for age group 45up by relating births to all women age 45 or older to this group. Prior to 1997, only births to women age 45-49 were included.
https://www.cdc.gov/nchs/nvss/births.htm
Hamilton, Brady E., et al. "Births: final data for 2014." National Vital Statistics Reports 64.12 (2015): 1-64.
Calculates the total absolute error (TAE) between sample micro data and constraining
totals from the matching macro data. Allows for updating of prior TAE instead of re-calculating
to improve speed in iterating. The updating feature is particularly helpful for optimizing
micro data fitting via simulated annealing (see optimize_microdata
).
calculate_TAE( sample_data, constraint_list, prior_sample_totals = NULL, dropped_obs_totals = NULL, new_obs = NULL )
calculate_TAE( sample_data, constraint_list, prior_sample_totals = NULL, dropped_obs_totals = NULL, new_obs = NULL )
sample_data |
A |
constraint_list |
A |
prior_sample_totals |
An optional |
dropped_obs_totals |
An optional |
new_obs |
An optional |
## Not run: ## assumes that you have a micro_synthetic dataset named test_micro and attribute count ## named g respectively c_list <- add_constraint(attr_name= "gender", attr_totals= g, micro_data= test_micro, constraint_list= c_list) calculate_TAE(test_micro, c_list) ## End(Not run)
## Not run: ## assumes that you have a micro_synthetic dataset named test_micro and attribute count ## named g respectively c_list <- add_constraint(attr_name= "gender", attr_totals= g, micro_data= test_micro, constraint_list= c_list) calculate_TAE(test_micro, c_list) ## End(Not run)
Combine objects of class "smsm_set" into a single object of class "smsm_set"
combine_smsm(...)
combine_smsm(...)
... |
A list of objects of class 'smsm_set'. |
split
, all_geog_optimize_microdata
## Not run: combined <- combine_smsm(smsm1, smsm2, smsm3) ## End(Not run)
## Not run: combined <- combine_smsm(smsm1, smsm2, smsm3) ## End(Not run)
Derive synthetic micro datasets for each sub-geography of a given set of geographic
macro data constraining tabulations. See Details... By default, micro dataset generation is run
in parallel with load balancing. Macro data is assumed to have been pulled from the US Census API
via the acs
package.
derive_synth_datasets(macro_data, parallel = TRUE, leave_cores = 2)
derive_synth_datasets(macro_data, parallel = TRUE, leave_cores = 2)
macro_data |
A macro dataset list: the result of |
parallel |
Logical, defaults to |
leave_cores |
How many cores do you wish to leave open to other processing? |
A list
of the input macro datasets produced by
pull_synth_data
and a list
of synthetic micro datasets for each geographical
subset within the specified macro geography.
In the absence of true micro level datasets for a given geographic area, synthetic datasets
can be used. This function uses conditional and marginal probability distributions (at the
aggregate level) to generate synthetic micro population datasets, which are built one constraint
at a time. Taking as input the macro level data (class "macroACS"
), this function builds
synthetic micro datasets for each lower level geographical area within the area of study.
In simplest terms, the goal is to generate a joint probability distribution for an attribute vector; and, to create synthetic individuals from this distribution. However, note that information for the full joint distribution is typically not available, so we construct it as a product of conditional and marginal probabilities. This is done one attribute at a time; where it is assumed that there is some sort of continuum of attribute dependence. That is, some attributes are more important (eg. gender, age) in 'determining' others (eg. educational attainment, marital status, etc). These more important attributes need to be assigned first, whereas less important attributes may be assigned later. Most of these distinctions are largely intuitive, but care must be taken in choosing the order of constructed attributes.
This function provides a synthetic population with the following characteristics as well as each
synthetic individual's probability of inclusion. The included characteristics are: age, gender,
marital status, educational attainment, employment status, nativity, poverty status, geographic
mobility in the prior year, individual income, and race. Additional attributes which interest the
user may be added in a similar manner via synthetic_new_attribute
.
**Note:** INDIVIDUAL, not HOUSEHOLD level, synthetic population datasets are created.
Birkin, Mark, and M. Clarke. "SYNTHESIS-a synthetic spatial information system for urban and regional analysis: methods and examples." Environment and planning A 20.12 (1988): 1645-1671.
pull_synth_data
, acs.fetch
, geo.make
## Not run: # make geography la_geo <- acs::geo.make(state= "CA", county= "Los Angeles", tract= "*") # pull data elements for creating synthetic data la_dat <- pull_synth_data(2014, 5, la_geo) # derive synthetic data la_synthetic <- derive_synth_datasets(la_dat, leave_cores= 0) ## End(Not run)
## Not run: # make geography la_geo <- acs::geo.make(state= "CA", county= "Los Angeles", tract= "*") # pull data elements for creating synthetic data la_dat <- pull_synth_data(2014, 5, la_geo) # derive synthetic data la_synthetic <- derive_synth_datasets(la_dat, leave_cores= 0) ## End(Not run)
Gets aggregate, macro, data, either estimate or standard error, for a specified geography and specified dataset.
fetch_data(acs, geography, dataset = c("estimate", "st.err"), choice = NULL)
fetch_data(acs, geography, dataset = c("estimate", "st.err"), choice = NULL)
acs |
An object of class |
geography |
A character vector allowing string matching via |
dataset |
Either |
choice |
A character vector specifying the name of one of the datasets in |
Generate a list of attribute vectors for new synthetic attribute creation from a "macroACS" object.
gen_attr_vectors(acs, choice)
gen_attr_vectors(acs, choice)
acs |
An object of class |
choice |
A character vector specifying the name of one of the datasets in |
all_geog_synthetic_new_attribute
, synthetic_new_attribute
Extract the best fit micro population (resulting from the simulated annealing algorithm) for a given geography.
get_best_fit(obj, geography)
get_best_fit(obj, geography)
obj |
An object of class |
geography |
A string allowing string matching via |
Get the names of the datasets in a given "macroACS" object.
get_dataset_names(acs)
get_dataset_names(acs)
acs |
An object of class |
Get the data collection endyear from a "macroACS" object
get_endyear(acs)
get_endyear(acs)
acs |
An object of class |
Extract the final TAE (resulting from the simulated annealing algorithm) for a given geography.
get_final_tae(obj, geography)
get_final_tae(obj, geography)
obj |
An object of class |
geography |
A string allowing string matching via |
Get the summary information of the geography selected from a "macroACS" object
get_geography(acs)
get_geography(acs)
acs |
An object of class |
Get the data collection span from a "macroACS" object
get_span(acs)
get_span(acs)
acs |
An object of class |
Function that checks if the target object is a macro_micro
object.
is.macro_micro(x)
is.macro_micro(x)
x |
any R object. |
Returns TRUE
if its argument has class "macro_micro" among its classes and
FALSE
otherwise.
Function that checks if the target object is a macroACS
object.
is.macroACS(x)
is.macroACS(x)
x |
any R object. |
Returns TRUE
if its argument has class "macroACS" among its classes and
FALSE
otherwise.
Function that checks if the target object is a micro_synthetic
object.
is.micro_synthetic(x)
is.micro_synthetic(x)
x |
any R object. |
Returns TRUE
if its argument has class "micro_synthetic" among its classes and
FALSE
otherwise.
Function that checks if the target object is a smsm_set
object.
is.smsm_set(x)
is.smsm_set(x)
x |
any R object. |
Returns TRUE
if its argument has class "macroACS" among its classes and
FALSE
otherwise.
Function that checks if the target object is a synthACS
object.
is.synthACS(x)
is.synthACS(x)
x |
any R object. |
Returns TRUE
if its argument has class "synthACS" among its classes and
FALSE
otherwise.
An anonymized dataset containing the geographic information of hospitals in Los Angeles County California, USA.
la_hospitals
la_hospitals
A data.frame
with 631 observations and 7 variables
The hospital's longitude.
The hospital's lattitude.
The hospital's postal city.
The hospital's alpha FIPS code.
The hospital's five digit postal ZIP code.
The census tract in which the hospital is located.
The hospital's county – "LOS ANGELES".
A dataset containing life expectancy at certain ages by race, hispanic origin and sex for the United States, 2013.
LifeExp
LifeExp
A data.frame
with 396 observations and 4 variables.
The exact age, in years, at which life expectany is calculated.
The racial group of the deceased One of all
all races; white
whites; black
black / African-American; hispanic
Hispanic; non.hisp.white
non Hispanic whites; non.hispanic.black
non Hispanic blacks.
The gender of the deceased. One of c(both, male, female)
The life expectancy for an individual at the exact age with the given race and gender.
https://www.cdc.gov/nchs/nvss/deaths.htm
Xu, J. Q., S. L. Murphy, and K. D. Kochanek. "Deaths: final data for 2013." National Vital Statistics Reports 64.2 (2015).
Marginalize, (ie- reduce in number), attributes of a synthetic dataset of class 'micro_synthetic' or a list of synthetic datasets of class 'synthACS'. This is done by marginalizing the joint distribution based on a set of specified attributes (see Arguments below).
marginalize_attr(obj, varlist, marginalize_out = FALSE)
marginalize_attr(obj, varlist, marginalize_out = FALSE)
obj |
An object of class |
varlist |
A character vector of variable, or attribute, names in |
marginalize_out |
Logical. Do you wish to *remove* the variables in |
{ # dummy data setup set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), age= factor(sample(1:5, size= 100, replace= TRUE)), pov= factor(sample(c("below poverty", "at above poverty"), size= 100, replace= TRUE, prob= c(.15,.85))), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") df2 <- marginalize_attr(df, varlist= "gender") df3 <- marginalize_attr(df, varlist= c("gender", "age")) df4 <- marginalize_attr(df, varlist= c("gender", "age"), marginalize_out= TRUE) df_list <- replicate(10, df, simplify= FALSE) dummy_list <- replicate(10, list(NULL), simplify= FALSE) df_list <- mapply(function(a,b) {return(list(a, b))}, a= dummy_list, b= df_list, SIMPLIFY = FALSE) class(df_list) <- c("list", "synthACS") # run the function df_list2 <- marginalize_attr(df_list, varlist= c("gender", "age")) }
{ # dummy data setup set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), age= factor(sample(1:5, size= 100, replace= TRUE)), pov= factor(sample(c("below poverty", "at above poverty"), size= 100, replace= TRUE, prob= c(.15,.85))), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") df2 <- marginalize_attr(df, varlist= "gender") df3 <- marginalize_attr(df, varlist= c("gender", "age")) df4 <- marginalize_attr(df, varlist= c("gender", "age"), marginalize_out= TRUE) df_list <- replicate(10, df, simplify= FALSE) dummy_list <- replicate(10, list(NULL), simplify= FALSE) df_list <- mapply(function(a,b) {return(list(a, b))}, a= dummy_list, b= df_list, SIMPLIFY = FALSE) class(df_list) <- c("list", "synthACS") # run the function df_list2 <- marginalize_attr(df_list, varlist= c("gender", "age")) }
A dataset containing multiple birth rate data by race of the mother. Data for all races is provided for 1980-2014 and for individual races from 1990-2014.
MBR
MBR
A data.frame
with 110 observations and 8 variables.
The year for which data was was recorded.
The racial group of the mothers. One of all
all races; white
non-hispanic whites; black_aa
non Hispanic black / African-American; hisp_lat
Hispanic.
Total births for the year and racial group in the United States.
Total twin births for the year and racial group in the United States.
Total triplet or higher order births for the year and racial group in the United States.
The number of live births in all multiple deliveries per 1,000 live births.
The number of live births in all twin deliveries per 1,000 live births.
The number of live births in all triplet or higher order deliveries per 100,000 live births.
Data for race cateogry "all"
includes races other than white and black and origin
not stated.
Race and Hispanic origin are reported separately on birth certificates. Persons of Hispanic origin may be of any race.
https://www.cdc.gov/nchs/nvss/births.htm
Hamilton, Brady E., et al. "Births: final data for 2014." National Vital Statistics Reports 64.12 (2015): 1-64.
Optimize the candidate micro dataset such that the lowest loss against the
macro dataset constraints is obtained. Loss is defined here as total absolute error (TAE)
and constraints are defined by the constraint_list
. Optimization is done by
simulated annealing–see details.
optimize_microdata( micro_data, prob_name = "p", constraint_list, tolerance = round(sum(constraint_list[[1]])/2000 * length(constraint_list), 0), resample_size = min(sum(constraint_list[[1]]), max(500, round(sum(constraint_list[[1]]) * 0.005, 0))), p_accept = 0.4, max_iter = 10000L, seed = sample.int(10000L, size = 1, replace = FALSE), verbose = TRUE )
optimize_microdata( micro_data, prob_name = "p", constraint_list, tolerance = round(sum(constraint_list[[1]])/2000 * length(constraint_list), 0), resample_size = min(sum(constraint_list[[1]]), max(500, round(sum(constraint_list[[1]]) * 0.005, 0))), p_accept = 0.4, max_iter = 10000L, seed = sample.int(10000L, size = 1, replace = FALSE), verbose = TRUE )
micro_data |
A |
prob_name |
It is assumed that observations are weighted and do not have an equal probability
of occurance. This string specifies the variable within |
constraint_list |
A |
tolerance |
An integer giving the maximum acceptable loss (TAE), enabling early stopping. Defaults to a misclassification rate of 1 individual per 1,000 per constraint. |
resample_size |
An integer controlling the rate of movement about the candidate space.
Specifically, it specifies the number of observations to change between iterations. Defaults to
|
p_accept |
The acceptance probability for the Metropolis acceptance criteria. |
max_iter |
The maximum number of allowable iterations. Defaults to |
seed |
A seed for reproducibility. See |
verbose |
Logical. Do you wish to see verbose output? Defaults to |
Spatial microsimulation involves the study of individual-level phenomena within a specified set of geographies in which these individuals act. It involves the creation of synthetic data to model, via simulation, these phenomena. As a first step to simulation, an appropriate micro-level (ie. individual) dataset must be generated. This function creates such appropriate micro-level datasets given a set of candidate observations and macro-level constraints.
Optimization is done via simulated annealing, where we wish to minimize the total absolute error
(TAE) between the micro-data and the macro-constraints. The annealing procedure is controlled by
the parameters tolerance
, resample_size
, p_accept
, and
max_iter
. Specifically, tolerance
indicates the maximum allowable TAE between the
output micro-data and the macro-constraints within a given max_iter
allowable iterations
to converge. resample_size
and p_accept
control movement about the candidate space.
Specfically, resample_size
controls the jump size between neighboring
candidates and p_accept
controls the hill-climbing rate for exiting local minima.
Please see the references for a more detailed discussion of the simulated annealing procedure.
Ingber, Lester. "Very fast simulated re-annealing." Mathematical and computer modelling 12.8 (1989): 967-973.
Metropolis, Nicholas, et al. "Equation of state calculations by fast computing machines." The journal of chemical physics 21.6 (1953): 1087-1092.
Szu, Harold, and Ralph Hartley. "Fast simulated annealing." Physics letters A 122.3 (1987): 157-162.
## Not run: ## assumes you have micro_synthetic object named test_micro and constraint_list named c_list opt_data <- optimize_microdata(test_micro, "p", c_list, max_iter= 10, resample_size= 500, p_accept= 0.01, verbose= FALSE) ## End(Not run)
## Not run: ## assumes you have micro_synthetic object named test_micro and constraint_list named c_list opt_data <- optimize_microdata(test_micro, "p", c_list, max_iter= 10, resample_size= 500, p_accept= 0.01, verbose= FALSE) ## End(Not run)
Plot the path TAE in the simulated annealing algorithm for a given geography
plot_TAEpath(object, geography, ...)
plot_TAEpath(object, geography, ...)
object |
An object of class |
geography |
A string allowing string matching via |
... |
additional arguments passed to other methods |
A wrapper function to pull multiple base tables from ACS API via
acs.fetch
.
pull_acs_basetables(endyear, span, geography, table_vec)
pull_acs_basetables(endyear, span, geography, table_vec)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
table_vec |
A |
A 'macroACS'
class object
https://data.census.gov/cedsci/
## Not run: # make geography la_geo <- acs::geo.make(state= "CA", county= "Los Angeles") # pull data la_dat <- pull_acs_basetables(endyear= 2015, span= 1, geography= la_geo, table_vec= c("B01001", "B01002", "B01003")) ## End(Not run)
## Not run: # make geography la_geo <- acs::geo.make(state= "CA", county= "Los Angeles") # pull data la_dat <- pull_acs_basetables(endyear= 2015, span= 1, geography= la_geo, table_vec= c("B01001", "B01002", "B01003")) ## End(Not run)
Pull ACS data for a specified geography from base tables B15011 and B15012. Note: only 2014 data is supplied by ACS
pull_bachelors(endyear, span, geography)
pull_bachelors(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, and a data.frame
of the geography
metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B14001, B14003, B15001, B15002. Not currently implemented: B15010, B28006 Additional fields, mainly percentages and aggregations, are calculated.
pull_edu(endyear, span, geography)
pull_edu(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B07001, B07003, B07008, B07009, B07010, and B07012. These tables provide data on geographic mobility in the past year by a number of slices. Additional fields, mainly percentages and aggregations, are calculated.
pull_geo_mobility(endyear, span, geography)
pull_geo_mobility(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B09019, B11011, B19081, B25002, B25003, B25004, B25010, B25024, B25056, B25058, B25071, and B27001. Additional fields, mainly percentages and aggregations, are calculated.
pull_household(endyear, span, geography)
pull_household(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
acs.fetch
, geo.make
B28001 - TYPES OF COMPUTERS IN HOUSEHOLD
B28002 - PRESENCE AND TYPES OF INTERNET SUBSCRIPTIONS IN HOUSEHOLD
Pull ACS data for a specified geography from base tables B19083, B19301, B19326, B21001, B22001, B23020, B24011. Not yet implemented: B28004 Additional fields, mainly percentages and aggregations, are calculated.
pull_inc_earnings(endyear, span, geography)
pull_inc_earnings(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B12001, B12006, B12007, 12501 Additional fields, mainly percentages and aggregations, are calculated.
pull_mar_status(endyear, span, geography)
pull_mar_status(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B01001, B01002, B02001, B06007, B06008, B06009, B06010, B06011, AND B06012. These tables reference population counts by a number of slices. Multiple additional fields, mainly percentages and aggregations, are calculated.
pull_population(endyear, span, geography)
pull_population(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B17001, B17004, B18101, B19001, B19013, B19055, B19057. Not yet implemented: B17002 Additional fields, mainly percentages and aggregations, are calculated.
pull_pov_inc(endyear, span, geography)
pull_pov_inc(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B01001B-I and B02001. ' These tables reference population counts by race.
pull_race_data(endyear, span, geography)
pull_race_data(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, and a data.frame
of the geography
metadata from acs.fetch
.
Pull ACS data for a specified geography from base tables B01001, B02001, B12002, B15001, B06001, B06010, B23001, B17005, and B17005. These tables reference population counts by a number of slices. Multiple additional fields, mainly percentages and aggregations, are calculated.
pull_synth_data(endyear, span, geography)
pull_synth_data(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a list of data.frame
s of estimates,
a list of data.frame
s of standard errors,
and the geography metadata from acs.fetch
.
## Not run: # make geography la_geo <- acs::geo.make(state= "CA", county= "Los Angeles", tract= "*") # pull data elements for creating synthetic data la_dat <- pull_synth_data(2014, 5, la_geo) ## End(Not run)
## Not run: # make geography la_geo <- acs::geo.make(state= "CA", county= "Los Angeles", tract= "*") # pull data elements for creating synthetic data la_dat <- pull_synth_data(2014, 5, la_geo) ## End(Not run)
Pull ACS data for a specified geography from base tables B08012, B08101, B08121, B08103, B08124, B08016, B08017. Additional fields, mainly percentages and aggregations, are calculated.
pull_transit_work(endyear, span, geography)
pull_transit_work(endyear, span, geography)
endyear |
An integer, indicating the latest year of the data in the survey. |
span |
An integer in |
geography |
a valid |
A list
containing the endyear, span, a data.frame
of estimates,
a data.frame
of standard errors, a character vector of the original column names,
and a data.frame
of the geography metadata from acs.fetch
.
A dataset containing raw death rate data by race and gender of the deceased. Data is provided for 1980-2013.
rawDR
rawDR
A data.frame
with 612 observations and 4 variables.
The year for which data was was recorded.
The racial group of the deceased One of all
all races; white
whites; black_aa
black / African-American; nat_amer
American Indian
or Native Alaskan; asian_isl
Asian or Pacific Islander; hisp_lat
Hispanic.
The gender of the deceased. One of c(both, male, female)
The raw death rate. See details.
The death rate is defined as deaths per 100,000 population.
Populations are based on census counts enumerated as of April 1 of the census year and estimated as of July 1 for non-census years.
https://www.cdc.gov/nchs/nvss/deaths.htm
Xu, J. Q., S. L. Murphy, and K. D. Kochanek. "Deaths: final data for 2013." National Vital Statistics Reports 64.2 (2015).
Split a "macroACS" object into subsets. This may be helpful for users who have limited memory available on their machines before proceding to derive sample synthetic micro data.
split(acs, n_splits)
split(acs, n_splits)
acs |
An object of class |
n_splits |
An integer for the number of splits you wish to create. |
A dataset containing birth rate data by US state and age for all US states and territories in 2014.
stateFR
stateFR
A data.frame
with 612 observations and 3 variables.
The state or territory for which data was was recorded.
The age group of the mother.
The birth rate. See Details.
The birth rate is defined as births per 1,000 women in the specified group.
Birth rates for age_group
45_49 are computed by relating births to women aged 45 and
over to women aged 45-49
Data for the "United States" as a whole excludes data for the territories.
Data is missing (eg. NA
) when data does not meet standards of reliability or percision;
birth rates based on fewer than 20 births.
https://www.cdc.gov/nchs/nvss/births.htm
Hamilton, Brady E., et al. "Births: final data for 2014." National Vital Statistics Reports 64.12 (2015): 1-64.
summary
method for class 'smsm_set'.
## S3 method for class 'smsm_set' summary(object, ...)
## S3 method for class 'smsm_set' summary(object, ...)
object |
An object of class |
... |
additional arguments affecting the summary produced. |
Add a new attribute to a synthetic_micro dataset using conditional relationships between the new attribute and existing attributes (eg. wage rate conditioned on age and education level).
synthetic_new_attribute( df, prob_name = "p", attr_name = "variable", conditional_vars = NULL, sym_tbl = NULL )
synthetic_new_attribute( df, prob_name = "p", attr_name = "variable", conditional_vars = NULL, sym_tbl = NULL )
df |
An R object of class "synthetic_micro". |
prob_name |
A string specifying the column name of the |
attr_name |
A string specifying the desired name of the new attribute to be added to the data. |
conditional_vars |
An character vector specifying the existing variables, if any, on which
the new attribute (variable) is to be conditioned on. Variables must be specified in order.
Defaults to |
sym_tbl |
sym_tbl A |
A new synthetic_micro dataset with class "synthetic_micro".
New synthetic variables are introduced to the existing data via conditional probability. Similar
to derive_synth_datasets
, the goal with this function is to generate a joint
probability distribution for an attribute vector; and, to create synthetic individuals from
this distribution. Although no limit is placed on the number of variables on which to condition,
in practice, data rarely exists which allows more than two or three conditioning variables. Other
variables are assumed to be independent from the new attribute.
** There are four different types of conditional/marginal probability models which may be considered for a given new attribute: (1) Independence: it is assumed that each of the variables is independent of the others (2) Pairwise conditional independence: it is assumed that attributes are related to only one other attribute and independent of all others. (3) Conditional independence: Attributes can be depedent on some subset of other attributes and independent of the rest. (4) In the most general case, all attributes are jointly interrelated.
Conditioning is implemented via symbol-tables (sym_tbl
) to ensure accurate matching between
conditioning variables, new attribute levels, and new attribute probabilities. The symbol table
is constructed such that the key in the symbol-table's key-value pair is the specific values for
the set of conditioning variables. This key is the first N columns of sym_tbl
. A
recursive approach is employed to conditionally partition sym_tbl
. In this sense, the
*order* in which the conditional variables are supplied matters.
The value is final 2 columns of sym_tbl
which are a pair of (A) either counts or percentages
used to specify the probability for the new attribute and (B) the level that the new attribute takes on.
{ set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), edu= factor(sample(c("LT_college", "BA_degree"), size= 100, replace= TRUE)), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") ST <- data.frame(gender= c(rep("male", 3), rep("female", 3)), attr_pct= c(0.1, 0.8, 0.1, 0.05, 0.7, 0.25), levels= rep(c("low", "middle", "high"), 2)) df2 <- synthetic_new_attribute(df, prob_name= "p", attr_name= "SES", conditional_vars= "gender", sym_tbl= ST) ST2 <- data.frame(gender= c(rep("male", 3), rep("female", 6)), edu= c(rep(NA, 3), rep(c("LT_college", "BA_degree"), each= 3)), attr_pct= c(0.1, 0.8, 0.1, 10, 80, 10, 5, 70, 25), levels= rep(c("low", "middle", "high"), 3)) df2 <- synthetic_new_attribute(df, prob_name= "p", attr_name= "SES", conditional_vars= c("gender", "edu"), sym_tbl= ST2) }
{ set.seed(567L) df <- data.frame(gender= factor(sample(c("male", "female"), size= 100, replace= TRUE)), edu= factor(sample(c("LT_college", "BA_degree"), size= 100, replace= TRUE)), p= runif(100)) df$p <- df$p / sum(df$p) class(df) <- c("data.frame", "micro_synthetic") ST <- data.frame(gender= c(rep("male", 3), rep("female", 3)), attr_pct= c(0.1, 0.8, 0.1, 0.05, 0.7, 0.25), levels= rep(c("low", "middle", "high"), 2)) df2 <- synthetic_new_attribute(df, prob_name= "p", attr_name= "SES", conditional_vars= "gender", sym_tbl= ST) ST2 <- data.frame(gender= c(rep("male", 3), rep("female", 6)), edu= c(rep(NA, 3), rep(c("LT_college", "BA_degree"), each= 3)), attr_pct= c(0.1, 0.8, 0.1, 10, 80, 10, 5, 70, 25), levels= rep(c("low", "middle", "high"), 3)) df2 <- synthetic_new_attribute(df, prob_name= "p", attr_name= "SES", conditional_vars= c("gender", "edu"), sym_tbl= ST2) }
A dataset containing total fertility rate data by race of the mother. Data for all races is provided for 1970-2014 and for individual races from 1989-2014.
TFR
TFR
A data.frame
with 175 observations and 3 variables.
The year for which data was was recorded.
The racial group of the mothers. One of all
all races; white
non-hispanic whites; black_aa
black / African-American; nat_amer
American Indian
or Native Alaskan; asian_isl
Asian or Pacific Islander; hisp_lat
Hispanic or Latin
American.
The Total Fertility Rate. See Details
The Total Fertility Rate is defined as the sums of the birth rates for the 5-year age groups
found in BR2014
multiplied by 5.
https://www.cdc.gov/nchs/nvss/births.htm
Hamilton, Brady E., et al. "Births: final data for 2014." National Vital Statistics Reports 64.12 (2015): 1-64.