Please note that this package requires the installation of Python version 3 or higher.
If you do not already have an updated version of Python installed on your computer, you can download the latest version here.
Additionally, this package is dependent on Pandas, Scikit-learn, and Numpy. If your version of Python does not have these packages, you can install them here.
We recommend installing the Anaconda package, which comes with the Python libraries our software is dependent on already pre-installed.

DAME is currently available in Python for download on the almost-matching-exactly Github or for installation via PyPi (recommended). The package includes both the DAME and FLAME algorithms and can be installed with the following command:

`pip install dame-flame`

Copied to clipboard!

To begin using the DAME algorithm, first ensure that your dataset meets the necessary requirements.
The data can be stored as a CSV/Excel file or as a **Python Pandas Data Frame**.
Remember, all covariates should be catagorical and expressed as *integer* data types. If there are continuous covariates, please consider regrouping. In addition to the covariate columns,
your dataset should include a column of binary *integer* data types which specify whether a unit is treated (1) or control (0)
and a column of *integer* or *float* data types which specify unit outcomes. There are no requirements for covariate column names or the ordering of the columns.
Below is a sample dataset in the required format:

x_1 (integer) | x_2 (integer) | ... | x_m (integer) | treated (binary) | outcome (numeric) |
---|---|---|---|---|---|

1 | 2 | ... | 4 | 1 | 4.5 |

2 | 3 | ... | 7 | 0 | 3.33 |

... | ... | ... | ... | ... | ... |

2 | 6 | ... | 1 | 1 | 6 |

To run the algorithm, use the DAME function as shown in line 5. Although the DAME function can
operate on a file or a dataframe, the additional functions included in the package require the latter format. As such, it is recommended to
first convert your data to a Python Pandas Data Data frame as shown in line 4:
The object result is by default a list of one entry, the first and only element of which is a dataframe containing all of the units that were matched and the covariates that they were matched on.
The covariates that each unit was not matched on are denoted with asterisks. The output of the algorithm may contain additional values based on optional parameters which are detailed in the next section.

To find the main matched group of a particular unit after running DAME, use the function mmg_of_unit as shown below:
To find the estimated treatment effect on a particular unit, use the function te_of_unit as shown below:

import pandas as pd import dame_flame df = pd.read_csv("data.csv") result = dame_flame.DAME_FLAME.DAME(input_data=df, treatment_column_name="treated", outcome_column_name="outcome") print(result[0])

To find the main matched group of a particular unit after running DAME, use the function mmg_of_unit as shown below:

mmg = dame_flame.DAME_FLAME.mmg_of_unit(return_df=result[0], unit_id=0, input_data=df) print(mmg)

te = dame_flame.DAME_FLAME.te_of_unit(return_df=result[0], unit_id=0, input_data=df, treatment_column_name="treated", outcome_column_name="outcome") print(te)

* denotes DAME-specific parameters

input_file, Dataframe, required |
The data to be matched. |

treatment_string, optional (default = 'treated') |
The name of the column which specifies whether a unit is treated or control. |

*weight_array, optional (default = False) |
Array of weights for all covariates in input_data. Only needed if adaptive_weights = False. |

outcome_string, optional (default = 'outcome') |
The name of the column which specifies each unit outcome. |

adaptive_string, optional (default = 'ridge') |
The weight dropping method to be used. If False, implements no weight dropping method. If 'ridge', implement ridge regression. If 'ridgeCV', implement ridge regression with cross-validation. If 'decision tree', implement decision tree regression. |

alpha:float, optional (default = 0.1) |
This is the alpha for ridge regression. We use the scikit package for ridge regression, so it is "regularization strength". Larger values specify stronger regularization. Must be positive float. |

holdout_file, Dataframe, optional (default = False) |
The data used in the holdout training set. Only required if doing an adaptive_weights version. If False, 10% of the input data will be randomly selected for the training set. |

repeats:bool, optional (default = True) |
Specifies whether values for which a main matched group has been found can be used again and placed in an auxiliary matched group. |

verbose:integer, optional (default = 2): |
Controls how progress is displayed while the algorithm is running. If 0, prints nothing. If 1, prints iteration number. If 2, prints the iteration number and number of units left to match on every 10th iteration. If 3, prints iteration number and number of units left to match on every iteration. |

want_bool, optional (default = False) |
Specifies whether the output will include the predictive error of the covariate sets matched on in each iteration. |

want_bool, optional (default = False) |
Specifies whether the output will include the balancing factor of each iteration. |

early_integer, optional (default = False): |
If provided, specifies the number of iterations after which to hard-stop the algorithm. |

stop_bools, optional (default = False) |
Specifies whether the algorithm stops when there are no control units remaining to match. |

stop_bools, optional (default = False) |
Specifies whether the algorithm stops when there are no treatment units remaining to match. |

early_float, optional (default = 0.1) |
Minimum acceptable proportion of unmatched treatment units after which the algorithm will stop iterating. Must be between 0 and 1. |

early_float, optional (default = 0.1) |
Minimum acceptable proportion of unmatched treatment units after which the algorithm will stop iterating. Must be between 0 and 1. |

early_bool, optional (default = False) |
If True, then the algorithm will hard-stop once the covariate set chosen for matching has a predictive error greater
than the threshold provided in early_ |

early_bool, optional (default = False) |
If True, then the algorithm will hard-stop once the covariate set chosen for matching has a balancing factor lower
than the threshold provided in early_ |

early_float, optional (default = 0.01) |
If early_ is True, then the algorithm will hard-stop once the covariate set chosen to match on has a predictive error greater than this value. |

early_float, optional (default = 0.01) |
If early_ is True, then the algorithm will hard-stop once the covariate set chosen to match on has a balancing factor lower than this value. |

missing_string, integer, np.nan, optional (default = np.nan) |
This is the indicator for missing data in the dataset. For example, if missing values are denoted with "NA", set this paramater equal to "NA". |

missing_integer, optional (default = 0): |
If 0, assume no missingness in holdout data. If 1, drop all units with missingness from holdout dataset. If 2, impute missing values using MICE on holdout dataset for the number of imputations specified by missing_ |

missing_integer, optional (default = 0): |
If 0, assume no missingness in matching data. If 1, drop all units with missingness from matching dataset. If 2, do not match units on covariates that are missing. If 3, impute missing values using MICE on matching dataset for the number of imputations specified by missing_. |

missing_integer, optional (default = 10) |
If missing_=2, specifies the number of imputations on the holdout set. |

missing_integer, optional (default = 1) |
If missing_=3, specifies the number of imputations on the matching set. |

#returns main matched group of specified unit
**mmg_of_unit**(return_df, unit_id, input_data, output_style=1)
#returns treatment effect on specified unit
**te_of_unit**(return_df, unit_id, input_data, treatment_column_name, outcome_column_name)
#returns main matched group and treatment effect for specified unit and includes fancy version of above functions
**mmg_and_te_of_unit**(return_df, unit_id, input_data, treatment_column_name,
outcome_column_name, return_vals=1)

return_Dataframe, required |
The dataframe containing all of the matches, obtained from the first element in the output of the algorithm. |

unit_integer, required |
The index identification for the unit of interest. |

input_Dataframe, required |
Dataframe containing the original dataset which has already been matched using DAME. |

treatment_string, required |
The name of the column in input_data which specifies whether a unit is treated or control. |

outcome_string, required |
The name of the column in input_ which specifies each unit outcome. |

output_integer, optional (default = 1) |
If 1, include only the covariates matched on in the output. If 2, include all attributes in the output. |

return_integer, optional (default = 1) |
If 0, print the main matched group and treatment effect for the specified unit (fancy version). Else, return the main matched group and treatment effect for the specified unit with no print statement (default version). |