R Package Python Package


Please note that this package requires the installation of R version 3.3 or higher. If you do not already have an updated version of R installed on your computer, you can find instructions on how to download the latest version here. Also note that this package only includes the bit-vector implementation of FLAME, so it cannot be applied to database management systems.

Installation

FLAME should be downloaded from the author's Github using the following command:

Copied to clipboard!
devtools::install_github('https://github.com/vittorioorlandi/FLAME')

Input Data Format

Input data must be stored in a data frame, which must contain covariates and treatment, and may contain an outcome column. Covariates are assumed to be categorical and will be coerced to factors, though they may be passed as either factors or numerics. If you wish to use continuous covariates for matching, they should be binned prior to being passed to FLAME. Treatment must be denoted by a logical or binary numeric column. The outcome column, if supplied, will be treated as continuous if numeric, as binary if a two-level factor or numeric with two unique values, and as multi-class if a factor with more than two levels. If no outcome column is provided, matching will still be done, but CATEs will not be estimated. They will also not be estimated if the outcome is passed as a factor. Below is a sample dataset satisfying the format requirements:
x_1 x_2 ... x_m treated outcome
3 0 ... 4 1 7.76
0 2 ... 1 0 5
... ... ... ... ... ...
0 6 ... 2 1 4.4

Usage

To generate sample data for exploring FLAMEs functionality, use the function gen_data as shown below. Remember to load the 'FLAME' package as shown in line 1 before calling any of the functions discussed in this section. This example generates a data frame with n = 250 units and p = 5 covariates:
library('FLAME')

data <- gen_data(n = 250, p = 5)
To run the algorithm, use the FLAME function as shown in line 3. The required data parameter can either be a path to a .csv file or a dataframe. In this example, a .csv file path is used:
library('FLAME')

FLAME_out <- FLAME(data = "data.csv", treated_column_name="treated", outcome_column_name="outcome")
print(FLAME_out$data)
The object FLAME_out is a list of six entries:
FLAME_out$data: a data frame containing the original data with an extra logical column denoting whether a unit was matched and an extra numeric column denoting how many times a unit was matched. The covariates that each unit was not matched on are denoted with asterisks.
FLAME_out$MGs: a list of every matched group formed by the algorithm.
FLAME_out$CATE: a vector containing the conditional average treatment effect (CATE) for every matched group formed.
FLAME_out$matched_on: a list corresponding to MGs that gives the covariates, and their values, on which units in each matched group were matched.
FLAME_out$matching_covs: a list containing the covariates that were used for matched on each iteration of the algorithm.
FLAME_out$dropped: a vector of the covariate dropped at each iteration.
To find the matched groups of particular units after running FLAME, use the function MG as shown below. In this example, the function would return the matched groups of units 1 and 2:
MG(c(1,2), FLAME_out)
To find the CATEs of particular units, use the function CATE as shown below. In this example, the function would return the matched groups of units 1 and 2:
CATE(c(1,2), FLAME_out)
To find the average treatment effect (ATE) or average treatment effect on the treated (ATT), use the functions ATE and ATT, respectively, as shown below:
ATE(FLAME_out = FLAME_out)
ATT(FLAME_out = FLAME_out)

FLAME - Parameters and Defaults

FLAME(data, holdout = 0.1, C = 0.1, treated_column_name = "treated", outcome_column_name = "outcome", PE_method = "ridge", user_PE_fit = NULL, user_PE_fit_params = NULL, user_PE_predict = NULL, user_PE_predict_params = NULL, replace = FALSE, verbose = 2, return_pe = FALSE, return_bf = FALSE, early_stop_iterations = Inf, early_stop_epsilon = 0.25, early_stop_control = 0, early_stop_treated = 0, early_stop_pe = Inf, early_stop_bf = 0, missing_data = 0, missing_holdout = 0, missing_data_imputations = 5, missing_holdout_imputations = 5, impute_with_treatment = TRUE, impute_with_outcome = FALSE)
Expand all Collapse all

Key Parameters

data:
file, Dataframe, required
The data to be matched.
holdout:
numeric, file, Dataframe, optional (default = 0.1)
Holdout data used to compute predictive error. If a numeric scalar between 0 and 1 is provided, that proportion of data will be used. Otherwise, if a file path or dataframe is provided, that dataset will serve as the holdout data.
C
numeric, optional (default = 0.1)
Tradeoff parameter between predictive error and balancing factor. A greater value prioritizes more matches while a lower value prioritizes not dropping important covariates. Must be positive scalar.
treated_column_name:
string, optional (default = 'treated')
The name of the column which specifies whether a unit is treated or control.
outcome_column_name:
string, optional (default = 'outcome')
The name of the column which specifies each unit outcome.
PE_method:
string, optional (default = 'ridge')
The method used to compute PE.
If 'ridge', perform cross-validation using glmnet::cv.glmnet with default parameters.
If 'xgb', perform cross-validation using xgboost::xgb.cv with a wide range of parameter values and determines best values with respect to root-mean-square error (RMSE) (for continuous outcomes) or missclassification rate (for binary or multiclass outcomes).
user_PE_fit:
function, optional (default = NULL):
Optional function to be used instead of those provided for in PE_method to fit unit outcomes from the covariates. Must take in a matrix of covariates as its first argument and a vector of outcomes as its second argument.
user_PE_fit_params:
list, optional (default = NULL)
A named list of optional parameters to be used by user_PE_fit
user_PE_predict:
function, optional (default = NULL)
Optional function to be used instead of the default predict method for generating predictions from the output of user_PE_fit. Must take in an object of the type returned by user_PE_fit as its first argument and a matrix of values for which to generate predictions as its second argument.
user_PE_predict_params:
list, optional (default = NULL)
A named list of optional parameters to be used by user_PE_predict
replace:
logical scalar, optional (default = FALSE):
Specifies whether the same unit can be matched multiple times on different sets of covariates. If True, balancing factor is computed by dividing by the total number of treatment/control units instead of the the number of unmatched treatment/control units.
verbose:
integer, optional (default = 2):
Controls how progress is displayed while the algorithm is running.
If 0, prints nothing.
If 1, prints stopping condition.
If 2, prints the iteration number, the number of units left to match on every 5th iteration, and the stopping condition.
return_pe:
logical scalar, optional (default = FALSE):
If True, the predictive error at each iteration will be returned.
return_bf:
logical scalar, optional (default = FALSE):
If True, the balancing factor at each iteration will be returned.

Early Stopping Parameters

early_stop_iterations:
integer, optional (default = Inf):
Specifies the number of iterations after which to hard-stop the algorithm. If 0, one round of exact matching is performed before stopping.
early_stop_epsilon:
numeric, optional (default = 0.25)
Nonnegative numeric denoting the maximum acceptable percent change in predictive error relative to that computed using all covariates before the algorithm will stop iterating. Default corresponds to 25%.
early_stop_control:
numeric, optional (default = 0)
Minimum acceptable proportion of unmatched control units after which the algorithm will stop iterating. Must be between 0 and 1.
early_stop_treated:
numeric, optional (default = 0)
Minimum acceptable proportion of unmatched treatment units after which the algorithm will stop iterating. Must be between 0 and 1.
early_stop_pe:
numeric, optional (default = Inf)
Maximum acceptable predictive error. If FLAME attempts to drop a covariate which would increase PE above this threshold, it will stop iterating.
early_stop_bf:
numeric, optional (default = 0)
Minimum acceptable balancing factor. If FLAME attempts to drop a covariate which would decrease BF below this threshold, it will stop iterating.

Missing Data Parameters

missing_data:
integer, optional (default = 0)
If 0, assume no missingness in matching data.
If 1, drop units with missingness from matching data.
If 2, impute missing values using mice::mice on matching dataset for the number of imputations specified by missing_data_imputations.
If 3, do not match units on covariates that are missing.
missing_holdout:
integer, optional (default = 0):
If 0, assume no missingness in holdout data.
If 1, drop units with missingness from holdout data.
If 2, impute missing values mice::mice on holdout dataset for the number of imputations specified by missing_holdout_imputations.
missing_data_imputations:
integer, optional (default = 5)
If missing_data=2, specifies the number of imputations on the matching set.
missing_holdout_imputations:
integer, optional (default = 5)
If missing_holdout=2, specifies the number of imputations on the holdout set.
impute_with_treatment:
logical scalar, optional (default = TRUE)
If True, treatment assignment is used to impute covariates when missing_data=2 or missing_holdout=2
impute_with_outcome:
logical scalar, optional (default = FALSE)
If True, outcome information is used to impute covariates when missing_data=2 or missing_holdout=2

Additional Functions - Parameters and Defaults

#generates toy data gen_data(n = 250, p = 5, write = FALSE, path = getwd(), filename = "FLAME.csv") #returns matched groups for specified units MG(units, FLAME_out, multiple = FALSE, index_only = FALSE) #returns CATEs for specified units CATE(units, FLAME_out, multiple = FALSE) #returns ATE for matched dataset ATE(FLAME_out) #returns ATT for matched dataset ATT(FLAME_out)

Key Parameters

n:
integer, optional (default = 250)
Number units desired in the dataset created by gen_data.
p:
integer, optional (default = 5)
Number units desired in the dataset created by gen_data. Must be greater than 2.
write:
logical scalar, optional (default = FALSE)
Specifies whether the output of gen_data is stored as a .csv file according to the parameters path and filename.
path:
string, optional (default = getwd( ))
If write is TRUE, specifies the location path of the file created by gen_data.
filename:
string, optional (default = 'FLAME.csv')
If write is TRUE, specifies the name of the file created by gen_data.
units:
numeric vector, required
Vector of indices for the units of interest for the functions MG and CATE.
FLAME_out:
Dataframe, required
The output of a call to FLAME.
multiple:
logical scalar, optional (default = FALSE)
If FALSE (default), then the functions MG and CATE will only return objects pertaining to a unit's main matched group.
If TRUE, the aforementioned functions will return objects pertaining to every matched group containing a specified unit (only relevant if replace=TRUE).
index_only:
logical scalar, optional (default = FALSE)
If TRUE, the function MG will return only the indices of the units in each matched group.