User:Noblestats/new article name here

Reduced Error Logistic Regression (RELR) is a general machine learning method specifically designed to deal with error and dimensionality problems in logistic regression. RELR differs from standard logistic regression in that both error and observational probabilities are estimated. Although RELR was in development for many years, the final form of the RELR formulation and the resulting machine learning method were described and validated in very high dimensional data in Rice (2008) [1]. Independent research by Ball (2011) also validates RELR as having 2-3% better classification accuracy in comparison to standard regression methods.

RELR has some similarities to formulations that attempt to regularize logistic regression. For instance, the RELR log likelihood or maximum entropy objective function looks similar to that in methods such as Penalized Logistic Regression or Lasso Logistic Regression in the sense that it is a combination of the standard logistic regression objective and a penalty function. The big difference in RELR is that this penalty function is composed of error probabilities across independent variables. Thus, the most likely RELR solution is that which maximizes the joint probability across events that include both error and observational events.

Because error probabilities are included in the objective function, RELR has similarities to a formulation to model error in logistic regression by Golan et al. (1996)[2]. The major differences with the Golan error model are:

  1. The RELR error model penalizes each independent variable by an amount proportional to 1/t, where t is the t ratio that reflects the effect of the independent variable on the dependent variable.
  2. The RELR error model imposes symmetrical constraints that force an equal probability of positive and negative error across variables, where these constraints are weighted so that their effect is in direct proportion to the number of variables in a model.

The 1/t penalty is equivalent to an expectation that error is to be extremely large when small t values are present; this is assumed to be consistent with logit error that is Extreme Value Type I distributed as derived by Luce and Suppes (1965)[3] and McFadden (1974).[4]

Applications

edit

When large numbers of variables are present with relatively large t values as in high dimensional data, RELR’s standardized logit coefficients are proportional to corresponding t values across independent variables. This is helpful to avoid the curse of dimensionality, as variables can be screened based upon t-value magnitudes and only the smaller shortlist of what would be the most important variables are entered into a model. Additionally, because parsimony is a natural by-product of the optimal RELR model, there is no need to impose arbitrary parsimony restrictions on RELR, such as through AIC or BIC criteria. Thus, the variable selection method in RELR simply implements a graduated optimization method designed to find the maximum value of its log likeihood objective function. This built-in optimal variable selection can give very parsimonious but accurate solutions.

Empirical results suggest that RELR's validation sample predictive error can be significantly reduced compared to standard methods, especially with highly multicollinear datasets Rice (2008). These reduced error effects are especially noticeable with the average squared error or Brier Score compared to classification error measures such as Receiver Operator Curve – Area Under Curve (ROC AUC). The Brier Score is a more sensitive predictive modeling error measure than classification error, because the Brier score will only be relatively small when the predicted probabilities truly reflect the frequencies of the predicted event [5]. Thus, RELR's tendency to show largest error reduction effects in the Brier score is an advantage because this error measure is a better reflection of the accuracy of the probabilities. Empirical results also suggest that logit coefficients have very small error in RELR compared to standard logistic regression methods (Rice, 2008). Due to this small error, RELR's variable selection is consistent across completely independent samples given minimal training sample sizes.

When the same variables and observations are employed in RELR compared to standard methods, RELR's largest reduced error effects are seen in datasets with large number of variables, although new evidence suggests that RELR performs well with smaller numbers of variables. In any case, because of its reduced error, large training sample sizes are not required to build accurate RELR models. Even with very high dimensional datasets composed of tens of thousands of variables, accurate RELR models may be found with training sample sizes that are less than several thousand observations.

References

edit
  1. ^ Rice, D.M. (2008), Generalized reduced error logistic regression machine, Section on Statistical Computing - JSM Proceedings 2008, pp. 3855-3862.
  2. ^ Golan, A., Judge, G. and Perloff, J.M. (1996), A maximum entropy approach to recovering information from multinomial response data. Journal of the American Statistical Association, 91: 841-853.
  3. ^ Luce, R.D. and Suppes, P. (1965). Preference, utility and subjective probability, in R.D. Luce, R.R. Bush and E. Galanter (eds), Handbook of Mathematical Psychology, Vol. 3, Wiley and Sons, New York,NY, pp. 249-410.
  4. ^ McFadden, D. (1974).Conditional Logit Analysis of Qualitative Choice Behavior. In P. Zarembka (ed)Frontiers in Econometrics, New York, Academic Press, pp. 105-142.
  5. ^ Harrell, F. (2001). Regression Modeling Strategies, New York, Springer-Verlag.