Rice Analytics

Automated Reduced Error Predictive Analytics

Rice Analytics Issued Fundamental Patent on RELR Method

This Patent Covers RELR Error Modeling and Related Dimension Reduction

St. Louis, MO (USA), October 4, 2011 – Rice Analytics, the pioneers in automated reduced error regression, announced today the issuance to it by the US Patent Office for a patent for fundamental aspects of its Reduced Error Logistic Regression (RELR) technology.  This patent covers important error modeling and dimension reduction aspects of RELR.  Dan Rice, the inventor of RELR and President of Rice Analytics, stated the significance of this RELR patent as follows:

“While large numbers of patents are important in many technology applications, it is also clear that just one fundamental patent can lead to the breakthrough commercialization of an entire industry.  The MRI patent in the early 1970’s had such an effect and by the 1990’s had resulted in billions of dollars in licensing fees and enormous practical applications in medicine.  We believe that this RELR patent could have a similar effect in the field of Big Data analytics because RELR completely avoids the problematic and risky issues related to error and arbitrary model building choices that plague all other Big Data high dimensional regression algorithms.  RELR finally allows Big Data machine learning to be completely automated and interpretable. Just as the MRI allowed the physician to work at a much higher level and avoid arbitrary diagnostic choices where two physicians would come to completely different and inaccurate diagnoses, RELR allows analytic professionals to work at a much higher level and completely avoid arbitrary guesses in model building.  Thus, different modelers will no longer either build completely different models with the very same data or have to rely upon pre-filled parameters that are the arbitrary choices of others. Most modelers would spend significant time testing arbitrary parameters because they are worried about the large risk associated with such parameters, but then it is very hard for them to find the time to be creative. The complete automation that is the basis of RELR frees analytic professionals to work at a much higher and creative level, so they can pose better modeling problems and develop insightful model interpretations. Most importantly, unlike parsimonious variable selection in all other algorithms, RELR’s Parsed variable selection models actually can be interpreted because these models are not built with arbitrary choices and because they are consistent with maximum probability statistical theory.”

This US patent referenced as number 8,032,473 describes a method of modeling and reducing error in logistic regression that can be applied quite generally in machine learning applications. Logistic regression is one of the more general advanced analytics methods because it can be used to model the probability of outcomes in all classic regression problems without regard to the form of the dependent variable. The most common application of logistic regression is in modeling categorical outcomes, such as binary or ordinal outcomes. Yet, any continuous dependent variable can be categorized into intervals and also modeled with logistic regression, such as in forecasting and survival analysis problems.   Logistic regression remains one of the most widely used advanced analytics methods in business, government, medicine, and science applications. The reason for the popularity of logistic regression is that it allows the possibility of insight into the key putative drivers of the predicted regression outcome, but problems related to error and dimensionality are major limiting factors and prevent such insight with non-experimental data.  This patented RELR method overcomes these problems. 

Various regularization and variable selection approaches, such as Ridge, LARS, LASSO or Stepwise methods, have been proposed to handle the regression error or dimensionality problems or both.  These methods 1)require assumptions that are often far from realistic, 2)either require an analyst to test and tune various arbitrary parameters manually or rely upon pre-filled arbitrary parameters, and 3)often require a preliminary stage to reduce the dimensionality that uses another algorithm such as principal components analysis or decision trees that is also fraught with arbitrary choices.  A further problem is that multicollinearity and related overfitting error, due to highly correlated variables, are often a problem with high dimensional data even after these traditional methods to deal with error and dimensionality are applied.  As a result, building a logistic regression model with high dimensional data is an extremely difficult and time consuming challenge that often requires a large labor effort from analysts and almost always will require arbitrary choices that will differ across analysts and data samples.  Because these traditional logistic regression methods require such arbitrary choices, their solutions can also differ widely across analysts. Such variability will have a large effect on the quality of the solution and its accuracy.   Today’s era of “Big Data” is the era of high dimensional data, but traditional methods to deal with error and dimensionality in logistic regression do not overcome these problems.  Logistic regression is not alone in having these problems, as all other potentially interpretable predictive analytics methods have significant problems with high dimensional data.

Because RELR substantially avoids error and dimensionality problems in logistic regression, large sample sizes are not necessary for an accurate and stable RELR model.  However, RELR still clearly benefits from Big Data in terms of high dimensional data, as RELR can get dramatically more accurate with more possible variables even though its Parsed RELR variable selection usually only selects very few variables.  The RELR error modeling and its related dimension reduction method are described in the patent.  This patent also describes what is now called the Fullbest or “best Full RELR” method which produces accurate, but not parsimonious models. RELR’s highly effective parsimonious Parsed variable selection optimization method is not described in this patent, as it was discovered after the patent submission. The difference between Fullbest and Parsed RELR model is somewhat akin to the difference between a Full Regression model (after initial dimension reduction) and a Stepwise Regression model, although the Fullbest RELR model is usually substantially more accurate than an arbitrary Full model.  The primary application for Fullbest RELR is for purely predictive models with extremely small training samples such as fewer than several hundred observations when parsimonious Parsed variable selection would underfit the model. Unlike Parsed RELR, Fullbest RELR does not produce models that can be interpreted as consistent with causal explanations due to the redundancy of its variable selection.  All evidence now suggests that Parsed RELR performs substantially better in parsimony, accuracy, stability, and interpretability than widely used Stepwise Regression methods, along with other widely used methods (see white paper review of previous studies).  The significance of Parsed RELR is that it arises as the solution that is the maximum joint probability of all observed dependent variable events and inferred error model events.  With non-experimental data where all observations are assumed to be independent, this is readily interpretable as the maximum probability solution.

At present, the optimization procedure that is the basis of Parsed RELR’s maximum probability solution has only been partially disclosed publicly (see Rice, JSM Proceedings, 2008). Yet, the dimensionality handling and error modeling methods described in this patent are fundamental to all effective RELR modeling applications including parsimonious Parsed RELR variable selection.  It would also apply to all ordinal and interval categorized dependent variables in Parsed RELR models. While the patent is general enough to cover multinomial  dependent variables, implementations of both Parsed and Fullbest RELR have handled multinomial dependent variables by building separate binary models, as this avoids the arbitrary choice of a multinomial reference condition.

MyRELR is a trademark of Rice Analytics.  SAS® is a registered trademark of SAS Institute.

Copyright, 2011-2016 Rice Analytics, All rights reserved.

Postscript (July 10, 2014)  Both the original Fullbest and Parsed RELR methods have been substantially extended and improved in the case of handling ordinal variables and in the avoidance of only occasional but still annoying convergence issues, along with more prevalent bias issues in the case of the original Parsed RELR trade secret optimization method, which limited the applications of this parsimonious variable selection method.  These improved methods and the extensions are now called Implicit and Explicit RELR and are fully detailed in the book Calculus of Thought by Daniel M. Rice published by Elsevier-Academic Press in 2014.  These fully detailed improved methods are also now pending US and foreign patent (PCT/US14/46060), but they are still fundamentally based upon the error modeling and dimensionality handling methods that were originally patented in the 8,032,473 patent described in this press release.

Update (Feb 19, 2016)  We have now now applied for entry in the PCT National Phase of our PCT/US2014/046060 application, or its bypass equivalent, in countries across multiple continents since the beginning of this year.

.


Machine Learning  Segmentation  Consumer Surveys  Predictive Modeling  Risk Management