What is RELR?
Reduced Error Logistic Regression (RELR) is currently used daily to help profile likely customers and thus target media buying in a very large number of US consumer brands. RELR is a patented (US patent 8,032,473, and new pending US and foreign patent rights on extensions and improvements in international patent application PCT/US14/46060) automatic machine learning method that is highly recommended by very senior analytic executives (see endorsements on Case Studies page). RELR has been implemented for years as MyRELR^{TM}, and we license this as a SAS® language macro. We will also license other implementations of RELR such as C code implementations, built upon existing proprietary logistic regression implementations, to technology or software companies who wish to build proprietary products and services based upon RELR. RELR completely automates all tasks in machine learning including missing value handling, feature reduction, feature selection, interaction and nonlinear effect building, and the coding of nominal independent variables and all without arbitrary or subjective user choices. RELR simply provides automatic most probable solutions. In addition, RELR allows causal hypothesis testing through very high dimension matched control tests in observation data without conducting randomized controlled experiments, along with sequential machine learning that handles time dependency issues that plague standard regression methods. Simple probability theory and easytofollow toy Excel example models to demonstrate the advantages of RELR are the subject of a new book Calculus of Thought by Daniel M. Rice published in November of 2013 by the Academic Press Imprint of Elsevier.
The biggest advantage of RELR is that error is accurately modeled and substantially removed as a component of the regression model, so RELR regression coefficients have a small fraction of the error observed in traditional regression and machine learning methods. This lack of error gives stable, parsimonious, accurate and interpretable RELR variable selection models that avoid the significant risk and instability in Stepwise and other automatic variable selection methods that Breiman famously called the "quiet scandal of statistics". RELR also allows more complex diffuse "ensemblelike" models which also have very low error and also are generated automatically without human involvement; this is a major advantage over typical predictive ensemble methods in machine learning which require large human manual effort with associated human bias in the selection and tuning of models. The key to RELR's low error and stable models in all cases is this accurate error modeling and removal.
What are some of RELR's other advantages?
Imagine a standalone regression algorithm that has the benefits of Ensemble Modeling used in the Jeopardy and Netflix competitions in terms of giving very accurate models that have relatively very low error in prediction, but without the weeks or months of laborious implementation time that requires constant model tuning and a large team of modelers who build separate elementary models because RELR automatically constructs highly accurate models without requiring separate elementary models that cannot update automatically whereas RELR models can update automatically through its sequential learning capabilities.
Imagine a regression algorithm that does not have arbitrary parameters and automatically gives the most probable parsimonious variable selection solution, so all modelers will generate this same most probable model (even across independent representative samples of observations) unlike the wide variability in stepwise and other standard variable selection methods  especially across independent data samples and modelers with different biases. Hence, RELR models replicate given a minimal sample size of data, whereas standard variable selection methods do not.
Imagine an easytounderstand regression algorithm that allows you to get an accurate model with a tiny percentage of the training sample observations that standard regression algorithms would require.
Imagine that the parsimonious selected model has causal plausibility because it is accurate and interpretable with correct regression coefficient signs and because roughly the same model would be generated by an independent training sample. Yet, imagine that it also yields a related matched sample causal methodology to test the putative causal hypotheses that avoids the bias and high dimension problems of propensity score matching methods.
Imagine never having to worry about significant overfitting and multicollinearity problems because the regression algorithm does not impose limits on the number of variables and often can gets much more accurate when more variables are entered as candidate variables, even though the final parsimonious selection model may have fewer than 510 variables that are automatically selected.
Imagine never having to worry about time consuming cross validations that are ambiguous and sample dependent because RELR models do not overfit and do not use crossvalidation tunings.
Imagine usually getting a lift in classification accuracy compared to your current models with this lift often getting dramatically greater with high dimension candidate features (an increase as much as 25 KS statistic points was reported at a SAS User conference by an independent beta user in 2009  see Case Studies page  in a parsimonious variable selection model that resulted from a very high dimension problem involving 80,000 total variables/interactions).
Imagine that more complicated interaction and nonlinear features are only selected in the final variable selection if they are stable and relatively independent from simple linear features, so you avoid the uninterpretable complexity issues in other variable selection methods.
These imagined scenarios are just a subset of the actual advantages of RELR and why our customers strongly endorse RELR modeling.
SAS and Enterprise Miner are trademarks of SAS Institute. MyRELR^{TM} and SkyRELR^{TM} are trademarks of Rice Analytics.
