Title: | Imputation Methods for Multivariate Multinomial Data |
---|---|
Description: | Implements imputation methods using EM and Data Augmentation for multinomial data following the work of Schafer 1997 <ISBN: 978-0-412-04061-0>. |
Authors: | Alex Whitworth [aut, cre] |
Maintainer: | Alex Whitworth <[email protected]> |
License: | GPL-3 |
Version: | 0.8.4 |
Built: | 2024-10-31 04:20:19 UTC |
Source: | https://github.com/alexwhitworth/imputemulti |
Creates a data depedent prior for p-dimensional multinomial distributions
using a conjugate prior (eg ) based on 20
data_dep_prior_multi(dat)
data_dep_prior_multi(dat)
dat |
A |
A data.frame
containing identifiers for all possible and
the associated prior-counts,
Darnieder, William Francis. Bayesian methods for data-dependent priors. Dissertation. The Ohio State University, 2011.
A multivariate multinomial model imputed by EM or Data Augmentation is
represented as a mod_imputeMulti
object. A complete
dataset and model is represented as an imputeMulti
object.
Inherits from mod_imputeMulti
. Additional slots are supplied for (1) the
call to multinomial_impute
; (2) the missing and imputed data;
and (3) the number of observations with missing values.
## S4 method for signature 'imputeMulti' show(object) get_imputations(object) ## S4 method for signature 'imputeMulti' get_imputations(object) n_miss(object)
## S4 method for signature 'imputeMulti' show(object) get_imputations(object) ## S4 method for signature 'imputeMulti' get_imputations(object) n_miss(object)
object |
an object of class "imputeMulti" |
Gcall
the call to multinomial_impute
method
the modeling method
mle_call
the call to the estimation function
mle_iter
the number of iterations in estimation
mle_log_lik
the final log-likelihood
mle_cp
the conjugate prior if any
mle_x_y
the MLE estimate of the sufficient statistics and parameters
data
a list
of the missing and imputed data
nmiss
the number of observations with missing data
Objects are created by calls to
multinomial_impute
, multinomial_em
, or
multinomial_data_aug
.
multinomial_impute
, multinomial_em
,
multinomial_data_aug
Function that checks if the target object is a imputeMulti
object.
is.imputeMulti(x)
is.imputeMulti(x)
x |
any R object. |
Returns TRUE
if its argument has class "imputeMulti" among its classes and
FALSE
otherwise.
Function that checks if the target object is a mod_imputeMulti
object.
is.mod_imputeMulti(x)
is.mod_imputeMulti(x)
x |
any R object. |
Returns TRUE
if its argument has class "mod_imputeMulti" among its classes and
FALSE
otherwise.
Merge the imputed dataset from an imputeMulti
object with the original dataset.
Merging is done by rownames, since imputeMulti maintains row-order during imputation.
merge_imputed(impute_obj, y, ...)
merge_imputed(impute_obj, y, ...)
impute_obj |
An object of class "imputeMulti". |
y |
The dataset from which the missing data was imputed. |
... |
Arguments to be passed to other methods |
A multivariate multinomial model imputed by EM or Data Augmentation is
represented as a mod_imputeMulti
object. A complete
dataset and model is represented as an imputeMulti
object.
Slots for mod_imputeMulti
objects include: (1) the modeling method;
(2) the call to the estimation function; (3) the number of iterations in estimation;
(4) the final log-likelihood; (5) the conjugate prior if any; (6) the MLE estimate of
the sufficient statistics and parameters.
## S4 method for signature 'mod_imputeMulti' show(object) get_parameters(object) ## S4 method for signature 'mod_imputeMulti' get_parameters(object) get_prior(object) ## S4 method for signature 'mod_imputeMulti' get_prior(object) get_iterations(object) ## S4 method for signature 'mod_imputeMulti' get_iterations(object) get_logLik(object) ## S4 method for signature 'mod_imputeMulti' get_logLik(object) get_method(object) ## S4 method for signature 'mod_imputeMulti' get_method(object) ## S4 method for signature 'imputeMulti' n_miss(object)
## S4 method for signature 'mod_imputeMulti' show(object) get_parameters(object) ## S4 method for signature 'mod_imputeMulti' get_parameters(object) get_prior(object) ## S4 method for signature 'mod_imputeMulti' get_prior(object) get_iterations(object) ## S4 method for signature 'mod_imputeMulti' get_iterations(object) get_logLik(object) ## S4 method for signature 'mod_imputeMulti' get_logLik(object) get_method(object) ## S4 method for signature 'mod_imputeMulti' get_method(object) ## S4 method for signature 'imputeMulti' n_miss(object)
object |
an object of class "mod_imputeMulti" |
method
the modeling method
mle_call
the call to the estimation function
mle_iter
the number of iterations in estimation
mle_log_lik
the final log-likelihood
mle_cp
the conjugate prior if any
mle_x_y
the MLE estimate of the sufficient statistics and parameters
Objects are created by calls to
multinomial_impute
, multinomial_em
, or
multinomial_data_aug
.
multinomial_impute
, multinomial_em
,
multinomial_data_aug
Implement the Data Augmentation algorithm for multvariate multinomial data given
observed counts of complete and missing data ( and
). Allows for specification
of a Dirichlet conjugate prior.
multinomial_data_aug( x_y, z_Os_y, enum_comp, conj_prior = c("none", "data.dep", "flat.prior", "non.informative"), alpha = NULL, burnin = 100, post_draws = 1000, verbose = FALSE )
multinomial_data_aug( x_y, z_Os_y, enum_comp, conj_prior = c("none", "data.dep", "flat.prior", "non.informative"), alpha = NULL, burnin = 100, post_draws = 1000, verbose = FALSE )
x_y |
A |
z_Os_y |
A |
enum_comp |
A |
conj_prior |
A string specifying the conjugate prior. One of
|
alpha |
The vector of counts |
burnin |
A scalar specifying the number of iterations to use as a burnin. Defaults
to |
post_draws |
An integer specifying the number of draws from the posterior distribution.
Defaults to |
verbose |
Logical. If |
An object of class mod_imputeMulti-class
.
multinomial_em
, multinomial_impute
## Not run: data(tract2221) x_y <- multinomial_stats(tract2221[,1:4], output= "x_y") z_Os_y <- multinomial_stats(tract2221[,1:4], output= "z_Os_y") x_possible <- multinomial_stats(tract2221[,1:4], output= "possible.obs") imputeDA_mle <- multinomial_data_aug(x_y, z_Os_y, x_possible, n_obs= nrow(tract2221), conj_prior= "none", verbose= TRUE) ## End(Not run)
## Not run: data(tract2221) x_y <- multinomial_stats(tract2221[,1:4], output= "x_y") z_Os_y <- multinomial_stats(tract2221[,1:4], output= "z_Os_y") x_possible <- multinomial_stats(tract2221[,1:4], output= "possible.obs") imputeDA_mle <- multinomial_data_aug(x_y, z_Os_y, x_possible, n_obs= nrow(tract2221), conj_prior= "none", verbose= TRUE) ## End(Not run)
Implement the EM algorithm for multivariate multinomial data given
observed counts of complete and missing data ( and
). Allows for
specification of a Dirichlet conjugate prior.
multinomial_em( x_y, z_Os_y, enum_comp, n_obs, conj_prior = c("none", "data.dep", "flat.prior", "non.informative"), alpha = NULL, tol = 5e-07, max_iter = 10000, verbose = FALSE )
multinomial_em( x_y, z_Os_y, enum_comp, n_obs, conj_prior = c("none", "data.dep", "flat.prior", "non.informative"), alpha = NULL, tol = 5e-07, max_iter = 10000, verbose = FALSE )
x_y |
A |
z_Os_y |
A |
enum_comp |
A |
n_obs |
An integer specifying the number of observations in the original data. |
conj_prior |
A string specifying the conjugate prior. One of
|
alpha |
The vector of counts |
tol |
A scalar specifying the convergence criteria. Defaults to |
max_iter |
An integer specifying the maximum number of allowable iterations. Defaults
to |
verbose |
Logical. If |
An object of class mod_imputeMulti-class
.
multinomial_data_aug
, multinomial_impute
## Not run: data(tract2221) x_y <- multinomial_stats(tract2221[,1:4], output= "x_y") z_Os_y <- multinomial_stats(tract2221[,1:4], output= "z_Os_y") x_possible <- multinomial_stats(tract2221[,1:4], output= "possible.obs") imputeEM_mle <- multinomial_em(x_y, z_Os_y, x_possible, n_obs= nrow(tract2221), conj_prior= "none", verbose= TRUE) ## End(Not run)
## Not run: data(tract2221) x_y <- multinomial_stats(tract2221[,1:4], output= "x_y") z_Os_y <- multinomial_stats(tract2221[,1:4], output= "z_Os_y") x_possible <- multinomial_stats(tract2221[,1:4], output= "possible.obs") imputeEM_mle <- multinomial_em(x_y, z_Os_y, x_possible, n_obs= nrow(tract2221), conj_prior= "none", verbose= TRUE) ## End(Not run)
Impute values for multivariate multinomial data using either EM or Data Augmentation.
multinomial_impute( dat, method = c("EM", "DA"), conj_prior = c("none", "data.dep", "flat.prior", "non.informative"), alpha = NULL, verbose = FALSE, ... )
multinomial_impute( dat, method = c("EM", "DA"), conj_prior = c("none", "data.dep", "flat.prior", "non.informative"), alpha = NULL, verbose = FALSE, ... )
dat |
A |
method |
|
conj_prior |
A string specifying the conjugate prior. One of
|
alpha |
The vector of counts |
verbose |
Logical. If |
... |
Arguments to be passed to other methods |
An object of class imputeMulti-class
Schafer, Joseph L. Analysis of incomplete multivariate data. Chapter 7. CRC press, 1997.
data_dep_prior_multi
, multinomial_em
## Not run: data(tract2221) imputeEM <- multinomial_impute(tract2221[,1:4], method= "EM", conj_prior = "none", verbose= TRUE) imputeDA <- multinomial_impute(tract2221[,1:4], method= "DA", conj_prior = "non.informative", verbose= TRUE) ## End(Not run)
## Not run: data(tract2221) imputeEM <- multinomial_impute(tract2221[,1:4], method= "EM", conj_prior = "none", verbose= TRUE) imputeDA <- multinomial_impute(tract2221[,1:4], method= "DA", conj_prior = "non.informative", verbose= TRUE) ## End(Not run)
Calculate observed-data sufficient statistics, marginally-observed summary statistics or enumerate all possible observed patterns from a multivariate multinomial dataset.
multinomial_stats(dat, output = c("x_y", "z_Os_y", "possible.obs"))
multinomial_stats(dat, output = c("x_y", "z_Os_y", "possible.obs"))
dat |
A |
output |
A string specifying the desired output. One of |
A data.frame
containing either sufficient statistics or possible observed patterns.
## Not run: data(tract2221) obs_suff_stats <- multinomial_stats(tract2221, output= "x_y") marg_obs_suff_stats <- multinomial_stats(tract2221, output= "z_Os_y") ## End(Not run)
## Not run: data(tract2221) obs_suff_stats <- multinomial_stats(tract2221, output= "x_y") marg_obs_suff_stats <- multinomial_stats(tract2221, output= "z_Os_y") ## End(Not run)
summary method for class "imputeMulti"
## S4 method for signature 'imputeMulti' summary(object, ...)
## S4 method for signature 'imputeMulti' summary(object, ...)
object |
an object of class "imputeMulti" |
... |
further arguments passed to or from other methods. |
summary method for class "mod_imputeMulti"
## S4 method for signature 'mod_imputeMulti' summary(object, ...)
## S4 method for signature 'mod_imputeMulti' summary(object, ...)
object |
an object of class "mod_imputeMulti" |
... |
further arguments passed to or from other methods. |
sup of L1 distance between x and y
supDistC(x, y)
supDistC(x, y)
x |
A numeric |
y |
A numeric |
a numeric scalar.
A dataset containing attributes of 3974 individuals living in census tract 2221 in Los Angeles County, CA. Data comes from the 5-year American Community Survey with end year 2014. Missing values have been inserted.
tract2221
tract2221
A data.frame
with 3974 rows and 10 variables. All variables are of class factor
:
The individual's age coded in roughly 5 year age buckets.
The indiviudals gender – Male, Female
The individuals marital status. Takes one of 5 levels:
never_mar
never married; married
married; mar_apart
married but living apart;
divorced
divorced; and widowed
widowed
The individual's educational attainment. Takes one of 7 levels:
lt_hs
less than high school; some_hs
completed some high school but did not graduate;
hs_grad
high school graduate; some_col
completed some college but did not graduate;
assoc_dec
completed an associates degree; ba_deg
obtained a bachelors degree;
grad_deg
obtained a graduate or professional degree
The individuals employment status. Takes one of 3 levels:
employed
individual is in the labor force and employed;
unemployed
individual is in the labor force and unemployed;
not_in_labor_force
individual is not in the labor force
The individual's nativity status. Takes one of 4 values: born_state_residence
born in the state of residence; born_other_state
born in another US state; born_out_us
a US citizen born outside the US; foreigner
foreign born
The individual's poverty status in the past year. Takes one of 2 levels:
below_pov_level
below the poverty level; at_above_pov_level
at or above the poverty level
The individual's geographic mobility in the last year. Takes one of 5 values:
same house
lived in the same house; same county
moved within the same county;
same state
moved within the same state; same state
moved from a different county
within the same state; diff state
moved from a different state; moved from abroad
moved from another country
The individual's annual income. Takes one of 9 levels: no_income
no income;
1_lt10k
income <$10,000; 10k_lt15k
$10000-$14999; 15k_lt25k
$15000-$24999;
25k_lt35k
$25000-$34999; 35k_lt50k
$35000-$49999; 50k_lt65k
$50000-$64999;
65k_lt75k
$65000-$74999; gt75k
$75000+
The individual's ethnicity.