Rice Analytics

Automated Reduced Error Predictive Analytics


What is RELR?

Reduced Error Logistic Regression (RELR) is currently used daily to help profile likely customers and thus target media buying in a very large number of US consumer brands. RELR is a patented (US patent 8,032,473, and new pending US and foreign patent rights on extensions and improvements in international patent application PCT/US14/46060) automatic machine learning method that is highly recommended by very senior analytic executives (see endorsements on Case Studies page). RELR has been implemented for years as MyRELRTM, and we license this as a SAS® language macro. We will also license other implementations of RELR such as C code implementations, built upon existing proprietary logistic regression implementations, to technology or software companies who wish to build proprietary products and services based upon RELR RELR completely automates all tasks in machine learning including missing value handling, feature reduction, feature selection, interaction and nonlinear effect building, and the coding of nominal independent variables and all without arbitrary or subjective user choices. RELR simply provides automatic most probable solutions. In addition, RELR allows causal hypothesis testing through very high dimension matched control tests in observation data without conducting randomized controlled experiments, along with sequential machine learning that handles time dependency issues that plague standard regression methods. Simple probability theory and easy-to-follow toy Excel example models to demonstrate the advantages of RELR are the subject of a new book Calculus of Thought by Daniel M. Rice published in November of 2013 by the Academic Press Imprint of Elsevier. 

The biggest advantage of RELR is that error is accurately modeled and substantially removed as a component of the regression model, so RELR regression coefficients have a small fraction of the error observed in traditional regression and machine learning methods.  This lack of error gives stable, parsimonious, accurate and interpretable RELR variable selection models that avoid the significant risk and instability in Stepwise and other automatic variable selection methods that Breiman famously called the "quiet scandal of statistics".  RELR also allows more complex diffuse "ensemble-like" models which also have very low error and also are generated automatically without human involvement; this is a major advantage over typical predictive ensemble methods in machine learning which require large human manual effort with associated human bias in the selection and tuning of models.  The key to RELR's low error and stable models in all cases is this accurate error modeling and removal.

What are some of RELR's other advantages?

Imagine a stand-alone regression algorithm that has the benefits of Ensemble Modeling used in the Jeopardy and Netflix competitions in terms of giving very accurate models that have relatively very low error in prediction, but without the weeks or months of laborious implementation time that requires constant model tuning and a large team of modelers who build separate elementary models because RELR automatically constructs highly accurate models without requiring separate elementary models that cannot update automatically whereas RELR models can update automatically through its sequential learning capabilities.

Imagine a regression algorithm that does not have arbitrary parameters and automatically gives the most probable parsimonious variable selection solution, so all modelers will generate this same most probable model (even across independent representative samples of observations) unlike the wide variability in stepwise and other standard variable selection methods - especially across independent data samples and modelers with different biases.  Hence, RELR models replicate given a minimal sample size of data, whereas standard variable selection methods do not.

Imagine an easy-to-understand regression algorithm that allows you to get an accurate model with a tiny percentage of the training sample observations that standard regression algorithms would require.

Imagine that the parsimonious selected model has causal plausibility because it is accurate and interpretable with correct regression coefficient signs and because roughly the same model would be generated by an independent training sample. Yet, imagine that it also yields a related matched sample causal methodology to test the putative causal hypotheses that avoids the bias and high dimension problems of propensity score matching methods.

Imagine never having to worry about significant overfitting and multicollinearity problems because the regression algorithm does not impose limits on the number of variables and often can gets much more accurate when more variables are entered as candidate variables, even though the final parsimonious selection model may have fewer than 5-10 variables that are automatically selected.

Imagine never having to worry about time consuming cross validations that are ambiguous and sample dependent because RELR models do not overfit and do not use cross-validation tunings.

Imagine usually getting a lift in classification accuracy compared to your current models with this lift often getting dramatically greater with high dimension candidate features (an increase as much as 25 KS statistic points was reported at a SAS User conference by an independent beta user in 2009 - see Case Studies page - in a parsimonious variable selection model that resulted from a very high dimension problem involving 80,000 total variables/interactions). 

Imagine that more complicated interaction and nonlinear features are only selected in the final variable selection if they are stable and relatively independent from simple linear features, so you avoid the uninterpretable complexity issues in other variable selection methods.

These imagined scenarios are just a subset of the actual advantages of RELR and why our customers strongly endorse RELR modeling.

 

 

 

 

 

 

 

 

 

 

SAS and Enterprise Miner are trademarks of SAS Institute.  MyRELRTM and SkyRELRTM are trademarks of Rice Analytics. 

Are there other ROI benefits to RELR besides the obvious error reduction with high dimension problems and the much better real world generalization due to RELR's stability?  

RELR's benefits are not just limited to when there are high dimension candidate features, as there are also a number of case examples of benefits in low dimension data in the Calculus of Thought book.

Another nontrivial source of ROI relates to the automation, as RELR completely automates the entire predictive analytics process in a way that handles error and bias problems related to arbitrarily chosen parameters and multicollinearity and sampling error problems in other methods.  Thus, the manual labor cost savings can be substantial compared to other predictive analytics/machine learning methods.

RELR opens a new era of cognitive machines - which are completely automated computing machines which mimic many important aspects of learning, memory, and causal reasoning.  Leibniz had first proposed in the 17th century that the goal of Calculus should be to realize such automatic cognitive machines that completely avoid human biases and error in a Calculus of Thought.  RELR is such a Calculus of Thought. There is a complete theory of cognitive neural computation related to RELR that is also presented in this new book Calculus of Thought.  Although RELR is an imperfect engineered model of neural computation just as airplanes are an imperfect engineered model of how birds fly, it shows enough superficial similarity in aspects such as automation and ability to handle high dimension, multicollinear data to allow this completely new era of cognitive machines.

How is RELR licensed

We have a new cloud GUI and API product called SkyRELRTM written in Python that is available for worldwide distribution through a cloud server hosted on Amazon Web Services.  Please visit SkyRELR.com.  

How can we get more information about licensing SkyRELR

Please email us at info@riceanalytics.com or call 314-968-8175.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Machine Learning  Segmentation  Consumer Surveys  Predictive Modeling  Risk Management