Package 'eat' reference manual

Title:	Efficiency Analysis Trees
Description:	Functions are provided to determine production frontiers and technical efficiency measures through non-parametric techniques based upon regression trees. The package includes code for estimating radial input, output, directional and additive measures, plotting graphical representations of the scores and the production frontiers by means of trees, and determining rankings of importance of input variables in the analysis. Additionally, an adaptation of Random Forest by a set of individual Efficiency Analysis Trees for estimating technical efficiency is also included. More details in: <doi:10.1016/j.eswa.2020.113783>.
Authors:	Miriam Esteve [cre, aut] , Víctor España [aut] , Juan Aparicio [aut] , Xavier Barber [aut]
Maintainer:	Miriam Esteve <[email protected]>
License:	GPL-3
Version:	0.1.2
Built:	2025-03-26 05:58:21 UTC
Source:	https://github.com/miriamesteve/eat

Alpha Calculation for Pruning Procedure of Efficiency Analysis Trees

Description

This function gets the minimum alpha for each subtree evaluated during the pruning procedure of the Efficiency Analysis Trees technique.

Usage

alpha(tree)
alpha(tree)

Arguments

tree

A list containing the EAT nodes.

Value

Numeric value corresponding to the minimum alpha associated with a suitable node to be pruned.

Bagging data

Description

Bootstrap aggregating for data.

Usage

bagging(data, x, y)
bagging(data, x, y)

Arguments

`data`	Dataframe containing the variables in the model.
`x`	Column input indexes in data.
`y`	Column output indexes in data.

Value

List containing training dataframe and list with binary response as 0 if the observations have been selected for training and 0 in any other case.

Barplot Variable Importance

Description

This function generates a barplot with the importance of each predictor.

Usage

barplot_importance(m, threshold)
barplot_importance(m, threshold)

Arguments

`m`	Dataframe with the importance of each predictor.
`threshold`	Importance score value in which a line should be graphed.

Value

Barplot representing each variable on the x-axis and its importance on the y-axis.

Tuning an Efficiency Analysis Trees model

Description

This funcion computes the root mean squared error (RMSE) for a set of Efficiency Analysis Trees models built with a grid of given hyperparameters.

Usage

bestEAT(
  training,
  test,
  x,
  y,
  numStop = 5,
  fold = 5,
  max.depth = NULL,
  max.leaves = NULL,
  na.rm = TRUE
)
bestEAT(
  training,
  test,
  x,
  y,
  numStop = 5,
  fold = 5,
  max.depth = NULL,
  max.leaves = NULL,
  na.rm = TRUE
)

Arguments

`training`	Training `data.frame` or `matrix` containing the variables for model construction.
`test`	Test `data.frame` or `matrix` containing the variables for model assessment.
`x`	Column input indexes in `training`.
`y`	Column output indexes in `training`.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`fold`	Folds in which the dataset to apply cross-validation during the pruning is divided.
`max.depth`	Maximum depth of the tree.
`max.leaves`	Maximum number of leaf nodes.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

A data.frame with the sets of hyperparameters and the root mean squared error (RMSE) associated for each model.

Examples


data("PISAindex")

n <- nrow(PISAindex) # Observations in the dataset
selected <- sample(1:n, n * 0.7) # Training indexes
training <- PISAindex[selected, ] # Training set
test <- PISAindex[- selected, ] # Test set

bestEAT(training = training, 
        test = test,
        x = 6:9,
        y = 3,
        numStop = c(3, 5, 7),
        fold = c(5, 7, 10))


data("PISAindex")

n <- nrow(PISAindex) # Observations in the dataset
selected <- sample(1:n, n * 0.7) # Training indexes
training <- PISAindex[selected, ] # Training set
test <- PISAindex[- selected, ] # Test set

bestEAT(training = training, 
        test = test,
        x = 6:9,
        y = 3,
        numStop = c(3, 5, 7),
        fold = c(5, 7, 10))

Tuning a Random Forest + Efficiency Analysis Trees model

Description

This funcion computes the root mean squared error (RMSE) for a set of Random FOrest + Efficiency Analysis Trees models built with a grid of given hyperparameters.

Usage

bestRFEAT(
  training,
  test,
  x,
  y,
  numStop = 5,
  m = 50,
  s_mtry = c("5", "BRM"),
  na.rm = TRUE
)
bestRFEAT(
  training,
  test,
  x,
  y,
  numStop = 5,
  m = 50,
  s_mtry = c("5", "BRM"),
  na.rm = TRUE
)

Arguments

`training`	Training `data.frame` or `matrix` containing the variables for model construction.
`test`	Test `data.frame` or `matrix` containing the variables for model assessment.
`x`	Column input indexes in `training`.
`y`	Column output indexes in `training`.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`m`	Number of trees to be built.
`s_mtry`	`character`. Number of inputs to be selected in each split. See “
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

A data.frame with the sets of hyperparameters and the root mean squared error (RMSE) associated for each model.

Examples


data("PISAindex")

n <- nrow(PISAindex) # Observations in the dataset
selected <- sample(1:n, n * 0.7) # Training indexes
training <- PISAindex[selected, ] # Training set
test <- PISAindex[- selected, ] # Test set

bestRFEAT(training = training, 
          test = test,
          x = 6:9,
          y = 3,
          numStop = c(3, 5),
          m = c(20, 30),
          s_mtry = c("1", "BRM"))


data("PISAindex")

n <- nrow(PISAindex) # Observations in the dataset
selected <- sample(1:n, n * 0.7) # Training indexes
training <- PISAindex[selected, ] # Training set
test <- PISAindex[- selected, ] # Test set

bestRFEAT(training = training, 
          test = test,
          x = 6:9,
          y = 3,
          numStop = c(3, 5),
          m = c(20, 30),
          s_mtry = c("1", "BRM"))

Banker, Charnes and Cooper programming model with input orientation for a Convexified Efficiency Analysis Trees model

Description

Banker, Charnes and Cooper programming model with input orientation for a Convexified Efficiency Analysis Trees model.

Usage

CEAT_BCC_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
CEAT_BCC_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with scores.

Banker, Charnes and Cooper programming model with output orientation for a Convexified Efficiency Analysis Trees model

Description

Banker, Charnes and Cooper programming model with output orientation for a Convexified Efficiency Analysis Trees model.

Usage

CEAT_BCC_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
CEAT_BCC_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with efficiency scores.

Directional Distance Function mathematical programming model for a Convexified Efficiency Analysis Trees model

Description

Directional Distance Function for a Convexified Efficiency Analysis Trees model.

Usage

CEAT_DDF(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
CEAT_DDF(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with scores.

Russell Model with input orientation for a Convexified Efficiency Analysis Trees model

Description

Russell Model with input orientation for a Convexified Efficiency Analysis Trees model.

Usage

CEAT_RSL_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
CEAT_RSL_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with scores.

Russell Model with output orientation for a Convexified Efficiency Analysis Trees model

Description

Russell Model with output orientation for a Convexified Efficiency Analysis Trees model.

Usage

CEAT_RSL_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
CEAT_RSL_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with scores.

Weighted Additive Model for a Convexified Efficiency Analysis Trees model

Description

Weighted Additive Model for a Convexified Efficiency Analysis Trees model.

Usage

CEAT_WAM(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves, weights)
CEAT_WAM(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves, weights)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.
`weights`	`"MIP"` for Measure of Inefficiency Proportion or `"RAM"` for Range Adjusted Measure of Inefficiency.

Value

A numerical vector with scores.

Check Efficiency Analysis Trees.

Description

This function verifies if a specific tree keeps to Pareto-dominance properties.

Usage

checkEAT(tree)
checkEAT(tree)

Arguments

tree

A list containing the EAT nodes.

Value

Message indicating if the tree is acceptable or warning in case of breaking any Pareto-dominance relationship.

Pareto-dominance relationships

Description

This function denotes if a node dominates another one or if there is no Pareto-dominance relationship.

Usage

comparePareto(t1, t2)
comparePareto(t1, t2)

Arguments

`t1`	A first node.
`t2`	A second node.

Value

-1 if t1 dominates t2, 1 if t2 dominates t1 and 0 if there are no Pareto-dominance relationships.

Deep Efficiency Analysis Trees

Description

This function creates a deep Efficiency Analysis Tree and a set of possible prunings by the weakest-link pruning procedure.

Usage

deepEAT(data, x, y, numStop = 5, max.depth = NULL, max.leaves = NULL)
deepEAT(data, x, y, numStop = 5, max.depth = NULL, max.leaves = NULL)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in `data`.
`y`	Column output indexes in `data`.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`max.depth`	Maximum depth of the tree.
`max.leaves`	Maximum number of leaf nodes.

Value

A list containing each possible pruning for the deep tree and its associated alpha value.

Efficiency Analysis Trees

Description

This function estimates a stepped production frontier through regression trees.

Usage

EAT(
  data,
  x,
  y,
  numStop = 5,
  fold = 5,
  max.depth = NULL,
  max.leaves = NULL,
  na.rm = TRUE
)
EAT(
  data,
  x,
  y,
  numStop = 5,
  fold = 5,
  max.depth = NULL,
  max.leaves = NULL,
  na.rm = TRUE
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`fold`	Set of number of folds in which the dataset to apply cross-validation during the pruning is divided.
`max.depth`	Depth of the tree.
`max.leaves`	Maximum number of leaf nodes.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Details

The EAT function generates a regression tree model based on CART (Breiman et al. 1984) under a new approach that guarantees obtaining a stepped production frontier that fulfills the property of free disposability. This frontier shares the aforementioned aspects with the FDH frontier (Deprins and Simar 1984) but enhances some of its disadvantages such as the overfitting problem or the underestimation of technical inefficiency. More details in Esteve et al. (2020).

Value

An EAT object containing:

data
- df: data frame containing the variables in the model.
- x: input indexes in data.
- y: output indexes in data.
- input_names: input variable names.
- output_names: output variable names.
- row_names: rownames in data.
control
- fold: fold hyperparameter value.
- numStop: numStop hyperparameter value.
- max.leaves: max.leaves hyperparameter value.
- max.depth: max.depth hyperparameter value.
- na.rm: na.rm hyperparameter value.
tree: list structure containing the EAT nodes.
nodes_df: data frame containing the following information for each node.
- id: node index.
- SL: left child node index.
- N: number of observations at the node.
- Proportion: proportion of observations at the node.
- the output predictions.
- R: the error at the node.
- index: observation indexes at the node.
model
- nodes: total number of nodes at the tree.
- leaf_nodes: number of leaf nodes at the tree.
- a: lower bound of the nodes.
- y: output predictions.

References

Breiman L, Friedman J, Stone CJ, Olshen RA (1984). Classification and regression trees. CRC press.

Deprins D, Simar L (1984). “Measuring labor efficiency in post offices, The Performance of Public Enterprises: Concepts and Measurements, M. Marchand, P. Pestieau and H. Tulkens.”

Esteve M, Aparicio J, Rabasa A, Rodriguez-Sala JJ (2020). “Efficiency analysis trees: A new methodology for estimating production frontiers through decision trees.” Expert Systems with Applications, 162, 113783.

Examples

# ====================== #
# Single output scenario #
# ====================== #

simulated <- Y1.sim(N = 50, nX = 3)
EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5, max.leaves = 6)

# ====================== #
#  Multi output scenario #
# ====================== #

simulated <- X2Y2.sim(N = 50, border = 0.1)
EAT(data = simulated, x = c(1,2), y = c(3, 4), numStop = 10, fold = 7, max.depth = 7)

# ====================== #
# Single output scenario #
# ====================== #

simulated <- Y1.sim(N = 50, nX = 3)
EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5, max.leaves = 6)

# ====================== #
#  Multi output scenario #
# ====================== #

simulated <- X2Y2.sim(N = 50, border = 0.1)
EAT(data = simulated, x = c(1,2), y = c(3, 4), numStop = 10, fold = 7, max.depth = 7)

Banker, Charnes and Cooper Programming Model with Input Orientation for an Efficiency Analysis Trees model

Description

Banker, Charnes and Cooper programming model with input orientation for an Efficiency Analysis Trees model.

Usage

EAT_BCC_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
EAT_BCC_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with efficiency scores.

Banker, Charnes and Cooper Programming Model with Output Orientation for an Efficiency Analysis Trees model

Description

Banker, Charnes and Cooper programming model with output orientation for an Efficiency Analysis Trees model.

Usage

EAT_BCC_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
EAT_BCC_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with efficiency scores.

Directional Distance Function Programming Model for an Efficiency Analysis Trees model

Description

Directional Distance Function for an Efficiency Analysis Trees model.

Usage

EAT_DDF(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
EAT_DDF(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with efficiency scores.

Output Levels in an Efficiency Analysis Trees model

Description

This function returns the frontier output levels for an Efficiency Analysis Trees model.

Usage

EAT_frontier_levels(object)
EAT_frontier_levels(object)

Arguments

object

An EAT object.

Value

A data.frame with the frontier output levels at the leaf nodes of the Efficiency Analysis Trees model introduced.

Examples

simulated <- Y1.sim(N = 50, nX = 3)
EAT_model <- EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5)
EAT_frontier_levels(EAT_model)

simulated <- Y1.sim(N = 50, nX = 3)
EAT_model <- EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5)
EAT_frontier_levels(EAT_model)

Descriptive Summary Statistics Table for the Leaf Nodes of an Efficiency Analysis Trees model

Description

This function returns a descriptive summary statistics table for each output variable calculated from the leaf nodes observations of an Efficiency Analysis Trees model. Specifically, it computes the number of observations, the proportion of observations, the mean, the variance, the standard deviation, the minimum, the first quartile, the median, the third quartile, the maximum and the root mean squared error.

Usage

EAT_leaf_stats(object)
EAT_leaf_stats(object)

Arguments

object

An EAT object.

Value

A list or a data.frame (for 1 output scenario) with the following summary statistics:

N: number of observations.
Proportion: proportion of observations.
mean: mean.
var: variance.
sd: standard deviation.
min: minimun.
Q1: first quartile.
median: median.
Q3: third quartile.
max: maximum.
RMSE: root mean squared error.

Examples

simulated <- Y1.sim(N = 50, nX = 3)
EAT_model <- EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5)
EAT_leaf_stats(EAT_model)

simulated <- Y1.sim(N = 50, nX = 3)
EAT_model <- EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5)
EAT_leaf_stats(EAT_model)

Create a EAT object

Description

This function saves information about the Efficiency Analysis Trees model.

Usage

EAT_object(
  data,
  x,
  y,
  rownames,
  numStop,
  fold,
  max.depth,
  max.leaves,
  na.rm,
  tree
)
EAT_object(
  data,
  x,
  y,
  rownames,
  numStop,
  fold,
  max.depth,
  max.leaves,
  na.rm,
  tree
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in `data`.
`y`	Column output indexes in `data`.
`rownames`	`string`. Data rownames.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`fold`	Set of number of folds in which the dataset to apply cross-validation during the pruning is divided.
`max.depth`	Maximum number of leaf nodes.
`max.leaves`	Depth of the tree.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted. If `FALSE`, an error occurs in case of `NA` rows.
`tree`	`list` containing the nodes of the Efficiency Analysis Trees pruned model.

Value

An EAT object.

Russell Model with Input Orientation for an Efficiency Analysis Trees model

Description

Russell Model with input orientation for an Efficiency Analysis Trees model.

Usage

EAT_RSL_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
EAT_RSL_in(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with efficiency scores.

Russell Model with Output Orientation for an Efficiency Analysis Trees model

Description

Russell Model with output orientation for an Efficiency Analysis Trees model.

Usage

EAT_RSL_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)
EAT_RSL_out(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.

Value

A numerical vector with efficiency scores.

Number of Leaf Nodes in an Efficiency Analysis Trees model

Description

This function returns the number of leaf nodes for an Efficiency Analysis Trees model.

Usage

EAT_size(object)
EAT_size(object)

Arguments

object

An EAT object.

Value

Number of leaf nodes of the Efficiency Analysis Trees model introduced.

Examples

simulated <- Y1.sim(N = 50, nX = 3)
EAT_model <- EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5)
EAT_size(EAT_model)

simulated <- Y1.sim(N = 50, nX = 3)
EAT_model <- EAT(data = simulated, x = c(1, 2, 3), y = 4, numStop = 10, fold = 5)
EAT_size(EAT_model)

Weighted Additive Model for an Efficiency Analysis Trees model

Description

Weighted Additive Model for an Efficiency Analysis Trees model.

Usage

EAT_WAM(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves, weights)
EAT_WAM(j, scores, x_k, y_k, atreeTk, ytreeTk, nX, nY, N_leaves, weights)

Arguments

`j`	Number of DMUs.
`scores`	`matrix`. Empty matrix for scores.
`x_k`	`data.frame`. Set of input variables.
`y_k`	`data.frame` Set of output variables.
`atreeTk`	`matrix` Set of "a" Pareto-coordinates.
`ytreeTk`	`matrix` Set of predictions.
`nX`	Number of inputs.
`nY`	Number of outputs.
`N_leaves`	Number of leaf nodes.
`weights`	Character. `"MIP"` for Measure of Inefficiency Proportion or `"RAM"` for Range Adjusted Measure of Inefficiency.

Value

A numerical vector with efficiency scores.

Efficiency Scores computed through a Convexified Efficiency Analysis Trees model.

Description

This function computes the efficiency scores for each DMU through a Convexified Efficiency Analysis Trees model.

Usage

efficiencyCEAT(
  data,
  x,
  y,
  object,
  scores_model,
  digits = 3,
  DEA = TRUE,
  print.table = FALSE,
  na.rm = TRUE
)
efficiencyCEAT(
  data,
  x,
  y,
  object,
  scores_model,
  digits = 3,
  DEA = TRUE,
  print.table = FALSE,
  na.rm = TRUE
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in `data`.
`y`	Column output indexes in `data`.
`object`	An `EAT` object.
`scores_model`	Mathematical programming model to calculate scores. `BCC.OUT` BCC model. Output-oriented. Efficiency level at 1. `BCC.INP` BCC model. Input-oriented. Efficiency level at 1. `DDF` Directional Distance Function. Efficiency level at 0. `RSL.OUT` Russell model. Output-oriented. Efficiency level at 1. `RSL.INP` Russell model. Input-oriented. Efficiency level at 1. `WAM.MIP` Weighted Additive Model. Measure of Inefficiency Proportions. Efficiency level at 0. `WAM.RAM` Weighted Additive Model. Range Adjusted Measure of Inefficiency. Efficiency level at 0.
`digits`	Decimal units for scores.
`DEA`	`logical`. If `TRUE`, the DEA scores are also calculated with the programming model selected in `scores_model`.
`print.table`	`logical`. If `TRUE`, a summary descriptive table of the efficiency scores is displayed.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

A data.frame with the efficiency scores computed through a Convexified Efficiency Analysis Trees model. Optionally, a summary descriptive table of the efficiency scores can be displayed.

Examples



simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

efficiencyCEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model, 
              scores_model = "BCC.OUT", digits = 2, DEA = TRUE, print.table = TRUE,
              na.rm = TRUE)


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

efficiencyCEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model, 
              scores_model = "BCC.OUT", digits = 2, DEA = TRUE, print.table = TRUE,
              na.rm = TRUE)

Efficiency Scores Density Plot

Description

Density plot for efficiency scores.

Usage

efficiencyDensity(df_scores, model = c("EAT", "FDH"))
efficiencyDensity(df_scores, model = c("EAT", "FDH"))

Arguments

`df_scores`	`data.frame` with efficiency scores.
`model`	`chraracter` vector. Scoring models in the order of `df_scores` by columns. The available models are: `"EAT"`, `"FDH"`, `"CEAT"`, `"DEA"` and `"RFEAT"`.

Value

Density plot for efficiency scores.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)

EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

scores <- efficiencyEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model, 
                        scores_model = "BCC.OUT", digits = 2, FDH = TRUE, na.rm = TRUE)
                        
efficiencyDensity(df_scores = scores,
                  model = c("EAT", "FDH"))
 

simulated <- X2Y2.sim(N = 50, border = 0.2)

EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

scores <- efficiencyEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model, 
                        scores_model = "BCC.OUT", digits = 2, FDH = TRUE, na.rm = TRUE)
                        
efficiencyDensity(df_scores = scores,
                  model = c("EAT", "FDH"))

Efficiency Scores computed through an Efficiency Analysis Trees model.

Description

This function computes the efficiency scores for each DMU through an Efficiency Analysis Trees model.

Usage

efficiencyEAT(
  data,
  x,
  y,
  object,
  scores_model,
  digits = 3,
  FDH = TRUE,
  print.table = FALSE,
  na.rm = TRUE
)
efficiencyEAT(
  data,
  x,
  y,
  object,
  scores_model,
  digits = 3,
  FDH = TRUE,
  print.table = FALSE,
  na.rm = TRUE
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in `data`.
`y`	Column output indexes in `data`.
`object`	An `EAT` object.
`scores_model`	Mathematical programming model to calculate scores. `BCC.OUT` BCC model. Output-oriented. Efficiency level at 1. `BCC.INP` BCC model. Input-oriented. Efficiency level at 1. `DDF` Directional Distance Function. Efficiency level at 0. `RSL.OUT` Russell model. Output-oriented. Efficiency level at 1. `RSL.INP` Russell model. Input-oriented. Efficiency level at 1. `WAM.MIP` Weighted Additive Model. Measure of Inefficiency Proportions. Efficiency level at 0. `WAM.RAM` Weighted Additive Model. Range Adjusted Measure of Inefficiency. Efficiency level at 0.
`digits`	Decimal units for scores.
`FDH`	`logical`. If `TRUE`, FDH scores are also computed with the programming model selected in `scores_model`.
`print.table`	`logical`. If `TRUE`, a summary descriptive table of the efficiency scores is displayed.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

A data.frame with the efficiency scores computed through an Efficiency Analysis Trees model. Optionally, a summary descriptive table of the efficiency scores can be displayed.

Examples



simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

efficiencyEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model, 
              scores_model = "BCC.OUT", digits = 2, FDH = TRUE, print.table = TRUE,
              na.rm = TRUE)


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

efficiencyEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model, 
              scores_model = "BCC.OUT", digits = 2, FDH = TRUE, print.table = TRUE,
              na.rm = TRUE)

Efficiency Scores Jitter Plot

Description

This function returns a jitter plot from ggplot2. This graphic shows how DMUs are grouped into leaf nodes in a model built using the EAT function. Each leaf node groups DMUs with the same level of resources. The dot and the black line represent, respectively, the mean value and the standard deviation of the scores of its node. Additionally, efficient DMU labels always are displayed based on the model entered in the scores_model argument. Finally, the user can specify an upper bound upn and a lower bound lwb in order to show, in addition, the labels whose efficiency score lies between them.

Usage

efficiencyJitter(object, df_scores, scores_model, upb = NULL, lwb = NULL)
efficiencyJitter(object, df_scores, scores_model, upb = NULL, lwb = NULL)

Arguments

`object`	An `EAT` object.
`df_scores`	`data.frame` with efficiency scores (from `efficiencyEAT` or `efficiencyCEAT`).
`scores_model`	Mathematical programming model to calculate scores. `BCC.OUT` BCC model. Output-oriented. `BCC.INP` BCC model. Input-oriented. `DDF` Directional Distance Function. `RSL.OUT` Russell model. Output-oriented. `RSL.INP` Russell model. Input-oriented. `WAM.MIP` Weighted Additive Model. Measure of Inefficiency Proportions. `WAM.RAM` Weighted Additive Model. Range Adjusted Measure of Inefficiency.
`upb`	Numeric. Upper bound for labeling.
`lwb`	Numeric. Lower bound for labeling.

Value

Jitter plot with DMUs and scores.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

EAT_scores <- efficiencyEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model,
                            scores_model = "BCC.OUT", digits = 2, na.rm = TRUE)

efficiencyJitter(object = EAT_model, df_scores = EAT_scores, scores_model = "BCC.OUT")


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

EAT_scores <- efficiencyEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = EAT_model,
                            scores_model = "BCC.OUT", digits = 2, na.rm = TRUE)

efficiencyJitter(object = EAT_model, df_scores = EAT_scores, scores_model = "BCC.OUT")

Efficiency Scores computed through a Random Forest + Efficiency Analysis Trees model.

Description

This function computes the efficiency scores for each DMU through a Random Forest + Efficiency Analysis Trees model and the Banker Charnes and Cooper mathematical programming model with output orientation. Efficiency level at 1.

Usage

efficiencyRFEAT(
  data,
  x,
  y,
  object,
  digits = 3,
  FDH = TRUE,
  print.table = FALSE,
  na.rm = TRUE
)
efficiencyRFEAT(
  data,
  x,
  y,
  object,
  digits = 3,
  FDH = TRUE,
  print.table = FALSE,
  na.rm = TRUE
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in `data`.
`y`	Column output indexes in `data`.
`object`	A `RFEAT` object.
`digits`	Decimal units for scores.
`FDH`	`logical`. If `TRUE`, FDH scores are computed.
`print.table`	`logical`. If `TRUE`, a summary descriptive table of the efficiency scores is displayed.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

A data.frame with the efficiency scores computed through a Random Forest + Efficiency Analysis Trees model. Optionally, a summary descriptive table of the efficiency scores can be displayed.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
RFEAT_model <- RFEAT(data = simulated, x = c(1,2), y = c(3, 4))

efficiencyRFEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = RFEAT_model, 
                digits = 2, FDH = TRUE, na.rm = TRUE)


simulated <- X2Y2.sim(N = 50, border = 0.2)
RFEAT_model <- RFEAT(data = simulated, x = c(1,2), y = c(3, 4))

efficiencyRFEAT(data = simulated, x = c(1, 2), y = c(3, 4), object = RFEAT_model, 
                digits = 2, FDH = TRUE, na.rm = TRUE)

Estimation of child nodes

Description

This function gets the estimation of the response variable and updates Pareto-coordinates and the observation index for both new nodes.

Usage

estimEAT(data, leaves, t, xi, s, y)
estimEAT(data, leaves, t, xi, s, y)

Arguments

`data`	Data to be used.
`leaves`	List structure with leaf nodes or pending expansion nodes.
`t`	Node which is being split.
`xi`	Variable index that produces the split.
`s`	Value of xi variable that produces the split.
`y`	Column output indexes in data.

Value

Left and right children nodes.

Efficiency Analysis Trees Frontier Graph

Description

This function displays a plot with the frontier estimated by Efficiency Analysis Trees in a scenario of one input and one output.

Usage

frontier(
  object,
  FDH = FALSE,
  observed.data = FALSE,
  observed.color = "black",
  pch = 19,
  size = 1,
  rwn = FALSE,
  max.overlaps = 10
)
frontier(
  object,
  FDH = FALSE,
  observed.data = FALSE,
  observed.color = "black",
  pch = 19,
  size = 1,
  rwn = FALSE,
  max.overlaps = 10
)

Arguments

`object`	An EAT object.
`FDH`	Logical. If `TRUE`, FDH frontier is displayed.
`observed.data`	Logical. If `TRUE`, observed DMUs are displayed.
`observed.color`	String. Color for observed DMUs.
`pch`	Integer. Point shape.
`size`	Integer. Point size.
`rwn`	Logical. If `TRUE`, rownames are displayed.
`max.overlaps`	Exclude text labels that overlap too many things.

Value

Plot with estimated production frontier

Examples


simulated <- Y1.sim(N = 50, nX = 1)

model <- EAT(data = simulated,
             x = 1,
             y = 2)

frontier <- frontier(object = model,
                     FDH = TRUE, 
                     observed.data = TRUE,
                     rwn = TRUE)
plot(frontier)

simulated <- Y1.sim(N = 50, nX = 1)

model <- EAT(data = simulated,
             x = 1,
             y = 2)

frontier <- frontier(object = model,
                     FDH = TRUE, 
                     observed.data = TRUE,
                     rwn = TRUE)
plot(frontier)

Train and Test Sets Generation

Description

This function splits the original data in two new data sets: a train set and a test set.

Usage

generateLv(data, fold)
generateLv(data, fold)

Arguments

`data`	Data to be split into train and test subsets.
`fold`	Parts in which the original set is divided, to perform Cross-Validation.

Value

A list structure with the train and the test set.

Breiman's Variable Importance

Description

This function recalculates all the possible splits, with the exception of the one being used, and for each node and variable gets the best split based on their degree of importance.

Usage

imp_var_EAT(data, tree, x, y, digits)
imp_var_EAT(data, tree, x, y, digits)

Arguments

`data`	Data from EAT object.
`tree`	Tree from EAT object.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`digits`	Decimal units.

Value

A dataframe with the best split for each node and its variable importance.

Variable Importance through Random Forest + Efficiency Analysis Trees

Description

Variable Importance through Random Forest + Efficiency Analysis Trees.

Usage

imp_var_RFEAT(object, digits = 2)
imp_var_RFEAT(object, digits = 2)

Arguments

`object`	A `RFEAT` object
`digits`	Decimal units.

Value

Vector of input importance scores

Is Final Node

Description

This function evaluates a node and checks if it fulfills the conditions to be a final node.

Usage

isFinalNode(obs, data, numStop)
isFinalNode(obs, data, numStop)

Arguments

`obs`	Observation in the evaluated node.
`data`	Data with predictive variable.
`numStop`	Minimum number of observations in a node to be split.

Value

True if the node is a final node and false in any other case.

Layout for nodes in plotEAT

Description

This function modifies the coordinates of the nodes in the plotEAT function to overcome overlapping.

Usage

layout(py)
layout(py)

Arguments

`py`	a party object.

Value

Dataframe with suitable modifications of the node layout.

Breiman Importance

Description

This function evaluates the importance of each predictor by the notion of surrogate splits.

Usage

M_Breiman(object, digits)
M_Breiman(object, digits)

Arguments

`object`	An EAT object.
`digits`	Decimal units.

Value

Dataframe with one column and the importance of each variable in rows.

Mean Squared Error

Description

This function calculates the Mean Square Error between the predicted value and the observations in a given node.

Usage

mse(data, t, y)
mse(data, t, y)

Arguments

`data`	Data to be used.
`t`	A given node.
`y`	Column output indexes in data.

Value

Mean Square Error at a node.

Random Selection of Variables

Description

This function randomly selects the variables that are evaluated to divide a node and removes those that do not present variability.

Usage

mtry_inputSelection(data, x, t, mtry)
mtry_inputSelection(data, x, t, mtry)

Arguments

`data`	`data.frame` containing the training set.
`x`	Column input indexes in data.
`t`	Node which is being split.
`mtry`	Number of inputs selected for a node to be split.

Value

Index of the variables by which the node is divided.

PISA score and social index by country

Description

A dataset containing the PISA score in mathematics, reading and science and 13 variables related to the social index by country for 2018.

Usage

PISAindex
PISAindex

Format

A data frame with 72 rows and 18 variables:

Country: Country name
Continent: Country continent
S_PISA: PISA score in Science
R_PISA: PISA score in Reading
M_PISA: PISA score in Mathematics
NBMC: Nutritional and Basic Medical Care
WS: Water and Sanitation
S: Shelter
PS: Personal Safety
ABK: Access to Basic Knowledge
AIC: Access to Information and Communication
HW: Health and Wellness
EQ: Environmental Quality
PR: Personal Rights
PFC: Personal Freedom and Choice
I: Inclusiveness
AAE: Access to Advanced Education
GDP_PPP: Gross Domestic Product per capita adjusted by purchasing power parity

Source

https://www.socialprogress.org/

https://www.oecd.org/pisa/Combined_Executive_Summaries_PISA_2018.pdf

Efficiency Analysis Trees Plot

Description

Plot a tree-structure for an Efficiency Analysis Trees model.

Usage

plotEAT(object)
plotEAT(object)

Arguments

object

An EAT object.

Value

Plot object with the following elements for each node:

id: node index.
R: error at the node.
n(t): number of observations at the node.
an input name: splitting variable.
y: output prediction.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

plotEAT(EAT_model)


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

plotEAT(EAT_model)

Random Forest + Efficiency Analysis Trees Plot

Description

Plot a graph with the Out-of-Bag error for a forest consisting of m trees.

Usage

plotRFEAT(object)
plotRFEAT(object)

Arguments

object

A RFEAT object.

Value

Line plot with the OOB error and the number of trees in the forest.

Examples


simulated <- Y1.sim(N = 150, nX = 6)
RFmodel <- RFEAT(data = simulated, x = 1:6, y = 7, numStop = 10,
                  m = 50, s_mtry = "BRM", na.rm = TRUE)
plotRFEAT(RFmodel)


simulated <- Y1.sim(N = 150, nX = 6)
RFmodel <- RFEAT(data = simulated, x = 1:6, y = 7, numStop = 10,
                  m = 50, s_mtry = "BRM", na.rm = TRUE)
plotRFEAT(RFmodel)

Position of the node

Description

This function finds the node where a register is located.

Usage

posIdNode(tree, idNode)
posIdNode(tree, idNode)

Arguments

`tree`	A list containing EAT nodes.
`idNode`	Id of a specific node.

Value

Position of the node or -1 if it is not found.

Model Prediction for Efficiency Analysis Trees.

Description

This function predicts the expected output by an EAT object.

Usage

## S3 method for class 'EAT'
predict(object, newdata, x, ...)
## S3 method for class 'EAT'
predict(object, newdata, x, ...)

Arguments

`object`	An `EAT` object.
`newdata`	`data.frame`. Set of input variables to predict on.
`x`	Inputs index.
`...`	further arguments passed to or from other methods.

Value

data.frame with the original data and the predicted values.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

predict(object = EAT_model, newdata = simulated, x = c(1, 2))


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

predict(object = EAT_model, newdata = simulated, x = c(1, 2))

Model prediction for Random Forest + Efficiency Analysis Trees model.

Description

This function predicts the expected output by a RFEAT object.

Usage

## S3 method for class 'RFEAT'
predict(object, newdata, x, ...)
## S3 method for class 'RFEAT'
predict(object, newdata, x, ...)

Arguments

`object`	A `RFEAT` object.
`newdata`	`data.frame`. Set of input variables to predict on.
`x`	Inputs index.
`...`	further arguments passed to or from other methods.

Value

data.frame with the original data and the predicted values.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
RFEAT_model <- RFEAT(data = simulated, x = c(1, 2), y = c(3, 4))

predict(object = RFEAT_model, newdata = simulated, x = c(1, 2))

simulated <- X2Y2.sim(N = 50, border = 0.2)
RFEAT_model <- RFEAT(data = simulated, x = c(1, 2), y = c(3, 4))

predict(object = RFEAT_model, newdata = simulated, x = c(1, 2))

Model prediction for Free Disposal Hull

Description

This function predicts the expected output by a Free Disposal Hull model.

Usage

predictFDH(data, x, y)
predictFDH(data, x, y)

Arguments

`data`	Dataframe or matrix containing the variables in the model.
`x`	Vector. Column input indexes in data.
`y`	Vector. Column output indexes in data.

Value

Data frame with the original data and the predicted values through a Free Disposal Hull model.

Efficiency Analysis Trees Predictor

Description

This function predicts the expected value based on a set of inputs.

Usage

predictor(tree, register)
predictor(tree, register)

Arguments

`tree`	`list` with the tree nodes.
`register`	Set of independent values.

Value

The expected value of the dependent variable based on the given register.

Data Preprocessing for Efficiency Analysis Trees

Description

This function arranges the data in the required format and displays error messages.

Usage

preProcess(
  data,
  x,
  y,
  numStop = 5,
  fold = 5,
  max.depth = NULL,
  max.leaves = NULL,
  na.rm = TRUE
)
preProcess(
  data,
  x,
  y,
  numStop = 5,
  fold = 5,
  max.depth = NULL,
  max.leaves = NULL,
  na.rm = TRUE
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`fold`	Set of number of folds in which the dataset to apply cross-validation during the pruning is divided.
`max.depth`	Depth of the tree.
`max.leaves`	Maximum number of leaf nodes.
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

It returns a data.frame in the required format.

Individual EAT for Random Forest

Description

This function builds an individual tree for Random Forest

Usage

RandomEAT(data, x, y, numStop, s_mtry)
RandomEAT(data, x, y, numStop, s_mtry)

Arguments

`data`	Dataframe containing the training set.
`x`	Vector. Column input indexes in data.
`y`	Vector. Column output indexes in data.
`numStop`	Integer. Minimum number of observations in a node for a split to be attempted.
`s_mtry`	Number of variables randomly sampled as candidates at each split. The available options are: `"BRM"`, `"DEA1"`, `"DEA2"`, `"DEA3"`, `"DEA4"` or any integer.

Value

List of m trees in forest and the error that will be used in the ranking of the importance of the variables.

Ranking of Variables by Efficiency Analysis Trees model.

Description

This function computes the variable importance through an Efficiency Analysis Trees model.

Usage

rankingEAT(object, barplot = TRUE, threshold = 70, digits = 2)
rankingEAT(object, barplot = TRUE, threshold = 70, digits = 2)

Arguments

`object`	An `EAT` object.
`barplot`	`logical`. If `TRUE`, a barplot with the importance scores is displayed.
`threshold`	Importance score value in which a line is graphed.
`digits`	Decimal units.

Value

data.frame with the importance scores and a barplot representing the the variable importance if barplot = TRUE.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

rankingEAT(object = EAT_model,
           barplot = TRUE,
           threshold = 70,
           digits = 2)


simulated <- X2Y2.sim(N = 50, border = 0.2)
EAT_model <- EAT(data = simulated, x = c(1,2), y = c(3, 4))

rankingEAT(object = EAT_model,
           barplot = TRUE,
           threshold = 70,
           digits = 2)

Ranking of variables by Random Forest + Efficiency Analysis Trees model.

Description

This function calculates variable importance through a Random Forest + Efficiency Analysis Trees model.

Usage

rankingRFEAT(object, barplot = TRUE, digits = 2)
rankingRFEAT(object, barplot = TRUE, digits = 2)

Arguments

`object`	A `RFEAT` object.
`barplot`	`logical`. If `TRUE`, a barplot with importance scores is displayed.
`digits`	Decimal units.

Value

data.frame with the importance scores and a barplot representing the variable importance if barplot = TRUE.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.2)
RFEAT_model <- RFEAT(data = simulated, x = c(1,2), y = c(3, 4))

rankingRFEAT(object = RFEAT_model,
             barplot = TRUE,
             digits = 2)


simulated <- X2Y2.sim(N = 50, border = 0.2)
RFEAT_model <- RFEAT(data = simulated, x = c(1,2), y = c(3, 4))

rankingRFEAT(object = RFEAT_model,
             barplot = TRUE,
             digits = 2)

Branch Pruning

Description

This function computes the error of a branch as the sum of the errors of its child nodes.

Usage

RBranch(t, tree)
RBranch(t, tree)

Arguments

`t`	`list`. A given EAT node.
`tree`	A `list` containing the EAT nodes.

Value

A list containing (1) the sum of the errors of the child nodes of the pruned node and (2) the total number of leaf nodes that come from it.

RCV

Description

RCV

Usage

RCV(N, Lv, y, alphaIprim, fold, TAiv)
RCV(N, Lv, y, alphaIprim, fold, TAiv)

Arguments

`N`	Number of rows in data.
`Lv`	Test set.
`y`	Column output indexes in data.
`alphaIprim`	Alpha obtained as the square root of the product of two consecutive alpha values in tree_alpha list. It is used to find the best pruning tree.
`fold`	Parts in which the original data is divided into to perform Cross-Validation.
`TAiv`	List with each possible pruning for the deep tree generated with the train set and its associated alpha values.

Value

Set of best pruning and the associated error calculated with test sets.

Random Forest + Efficiency Analysis Trees Predictor

Description

This function predicts the expected value based on a set of inputs.

Usage

RF_predictor(forest, xn)
RF_predictor(forest, xn)

Arguments

`forest`	`list` containing the individual Efficiency Analysis Trees.
`xn`	Row indexes in data.

Value

Vector of predictions.

Random Forest + Efficiency Analysis Trees

Description

This function builds m individual Efficiency Analysis Trees in a forest structure.

Usage

RFEAT(data, x, y, numStop = 5, m = 50, s_mtry = "BRM", na.rm = TRUE)
RFEAT(data, x, y, numStop = 5, m = 50, s_mtry = "BRM", na.rm = TRUE)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`numStop`	Minimum number of observations in a node for a split to be attempted.
`m`	Number of trees to be built.
`s_mtry`	Number of variables randomly sampled as candidates at each split. The available options are: `"BRM"`: `in / 3` `"DEA1"`: `(t.obs / 2) - out` `"DEA2"`: `(t.obs / 3) - out` `"DEA3"`: `t.obs - 2 * out` `"DEA4"`: `min(t.obs / out, (t.obs / 3) - out)` Any integer
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.

Value

A RFEAT object containing:

data
- df: data frame containing the variables in the model.
- x: input indexes in data.
- y: output indexes in data.
- input_names: input variable names.
- output_names: output variable names.
- row_names: rownames in data.
control
- numStop: numStop hyperparameter value.
- m: m hyperparameter value.
- s_mtry: s_mtry hyperparameter value.
- na.rm: na.rm hyperparameter value.
forest: list structure containing the individual EAT models.
error: Out-of-Bag error at the forest.
OOB: list containing Out-of-Bag set for each tree.

Examples


simulated <- X2Y2.sim(N = 50, border = 0.1)

RFmodel <- RFEAT(data = simulated, x = c(1,2), y = c(3, 4), numStop = 5,
                  m = 50, s_mtry = "BRM", na.rm = TRUE)


simulated <- X2Y2.sim(N = 50, border = 0.1)

RFmodel <- RFEAT(data = simulated, x = c(1,2), y = c(3, 4), numStop = 5,
                  m = 50, s_mtry = "BRM", na.rm = TRUE)

Create a RFEAT object

Description

This function saves information about the Random Forest for Efficiency Analysis Trees model.

Usage

RFEAT_object(
  data,
  x,
  y,
  rownames,
  numStop,
  m,
  s_mtry,
  na.rm,
  forest,
  error,
  OOB
)
RFEAT_object(
  data,
  x,
  y,
  rownames,
  numStop,
  m,
  s_mtry,
  na.rm,
  forest,
  error,
  OOB
)

Arguments

`data`	`data.frame` or `matrix` containing the variables in the model.
`x`	Column input indexes in `data`.
`y`	Column output indexes in `data`.
`rownames`	`string`. Data rownames.
`numStop`	Minimun number of observations in a node for a split to be attempted.
`m`	Number of trees to be built.
`s_mtry`	Select number of inputs in each split. `"Breiman"`: `in / 3` `"DEA1"`: `(t.obs / 2) - out` `"DEA2"`: `(t.obs / 3) - out` `"DEA3"`: `t.obs - 2 * out` `"DEA4"`: `min(t.obs / out, (t.obs / 3) - out)`
`na.rm`	`logical`. If `TRUE`, `NA` rows are omitted.
`forest`	`list` containing the individual Efficiency Analysis Trees.
`error`	Error in Random Forest for Efficiency Analysis Trees.
`OOB`	`list` containing the observations with which each tree has been trained.

Value

A RFEAT object.

Pruning Scores

Description

This function calculates the score for each pruning of tree_alpha_list.

Usage

scores(N, Lv_notLv, x, y, fold, numStop, Tk, tree_alpha_list)
scores(N, Lv_notLv, x, y, fold, numStop, Tk, tree_alpha_list)

Arguments

`N`	Number of rows in data.
`Lv_notLv`	List with train and test sets.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`fold`	Parts in which the original data set is divided to perform Cross-Validation.
`numStop`	Minimum number of observations on a node to be split.
`Tk`	Best pruned tree.
`tree_alpha_list`	List with all the possible pruning and its associated alpha.

Value

List with the best pruning for each fold, the pruning with a lower score and tree_alpha_list with scores updated.

Select Possible Inputs in Split.

Description

This function selects the number of inputs for a split in Random Forest.

Usage

select_mtry(s_mtry, t, nX, nY)
select_mtry(s_mtry, t, nX, nY)

Arguments

`s_mtry`	Select number of inputs. It could be: `"BRM"`, `"DEA1"`, `"DEA2"`, `"DEA3"` or `"DEA4"` or any integer.
`t`	Node which is being split.
`nX`	Number of inputs in data.
`nY`	Number of outputs in data.

Value

Number of inputs selected according to the specified rule.

Select Tk

Description

This function tries to find a new pruned tree with a shorter length and a score in the range generated for SE.

Usage

selectTk(Tk, tree_alpha_list, SE)
selectTk(Tk, tree_alpha_list, SE)

Arguments

`Tk`	Best pruned tree score.
`tree_alpha_list`	List with all the possible pruning and its associated alpha and scores.
`SE`	Value to get a range where new prunings is found.

Value

The same best tree or a new suitable one.

SERules

Description

Based on Validation tests over BestTivs, a new range of scores is obtained to find new pruned trees.

Usage

SERules(N, Lv, y, fold, Tk_score, BestTivs)
SERules(N, Lv, y, fold, Tk_score, BestTivs)

Arguments

`N`	Number of rows in data.
`Lv`	Test set.
`y`	Column output indexes in data.
`fold`	Parts in which the original data set is divided to perform Cross-Validation.
`Tk_score`	Best pruned tree score.
`BestTivs`	List of best pruned trees for each training set.

Value

Value to get a range where new pruning is found.

Split node

Description

This function gets the variable and split value to be used in estimEAT, selects the best split and updates VarInfo, node indexes and leaves list.

Usage

split(data, tree, leaves, t, x, y, numStop)
split(data, tree, leaves, t, x, y, numStop)

Arguments

`data`	Data to be used.
`tree`	List structure with the tree nodes.
`leaves`	List with leaf nodes or pending expansion nodes.
`t`	Node which is being split.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`numStop`	Minimum number of observations in a node to be split.

Value

Leaves and tree lists updated with the new child nodes.

Split Node in Random Forest EAT

Description

This function gets the variable and split value to be used in estimEAT, selects the best split, node indexes and leaf list.

Usage

split_forest(data, tree, leaves, t, x, y, numStop, arrayK)
split_forest(data, tree, leaves, t, x, y, numStop, arrayK)

Arguments

`data`	Data to be used.
`tree`	List structure with the tree nodes.
`leaves`	List with leaf nodes or pending expansion nodes.
`t`	Node which is being split.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`numStop`	Minimum number of observations on a node to be split.
`arrayK`	Column input indexes in data selected by s_mtry.

Value

Leaves and tree lists updated with the new child nodes.

Trees for RCV

Description

This function generates a deep EAT and all pruning for each train set.

Usage

treesForRCV(notLv, x, y, fold, numStop)
treesForRCV(notLv, x, y, fold, numStop)

Arguments

`notLv`	Train set.
`x`	Column input indexes in data.
`y`	Column output indexes in data.
`fold`	Parts in which the original set is divided to perform Cross-Validation.
`numStop`	Minimum number of observations in a node to be split.

Value

List with each possible pruning for the deep tree generated with train set and its associated alpha values.

2 Inputs & 2 Outputs Data Generation

Description

This function is used to simulate the data in a scenario with 2 inputs and 2 outputs.

Usage

X2Y2.sim(N, border, noise = NULL)
X2Y2.sim(N, border, noise = NULL)

Arguments

`N`	Sample size.
`border`	Percentage of DMUs in the frontier.
`noise`	Random noise.

Value

data.frame with simulated data.

Single Output Data Generation

Description

This function is used to simulate the data in a single output scenario.

Usage

Y1.sim(N, nX)
Y1.sim(N, nX)

Arguments

`N`	Sample size.
`nX`	Number of inputs. `1`, `3`, `6`, `9`, `12` and `15` are acceptable.

Value

data.frame with simulated data.

Package 'eat'

Help Index

Alpha Calculation for Pruning Procedure of Efficiency Analysis Trees

Description

Usage

Arguments

Value

Bagging data

Description

Usage

Arguments

Value

Barplot Variable Importance

Description

Usage

Arguments

Value

Tuning an Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Examples

Tuning a Random Forest + Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Examples

Banker, Charnes and Cooper programming model with input orientation for a Convexified Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Banker, Charnes and Cooper programming model with output orientation for a Convexified Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Directional Distance Function mathematical programming model for a Convexified Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Russell Model with input orientation for a Convexified Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Russell Model with output orientation for a Convexified Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Weighted Additive Model for a Convexified Efficiency Analysis Trees model

Description

Usage

Arguments

Value

Check Efficiency Analysis Trees.

Description

Usage

Arguments

Value

Pareto-dominance relationships

Description

Usage

Arguments

Value

Deep Efficiency Analysis Trees

Description

Usage

Arguments

Value

Efficiency Analysis Trees

Description

Usage

Arguments

Details

Value