Understanding some of the most common terms related to professional-grade automated valuation model (AVM) testing or validation
7 min read
What you’ll find in this post:
◦ Absolute error
◦ MAE (mean absolute error)
◦ MdAE (median absolute error)
◦ P10 (or PPE10)
◦ Hit rate
◦ Confidence score
◦ FSD (forecast standard deviation)
◦ Test sample
◦ True value
• A note on address standardization
Based on the positive response to my last post, A Lender’s Guide to the Top 3 AVM Testing Methods, I wanted to expand upon the addendum of common terms and definitions I included that one might expect to come across in the world of AVM testing or validation. As AVMs become more prevalent in lending or securitization workflows, this is the jargon we are commonly asked to define.
We believe that AVMs hold a place in our industry, but in order to make use of them we should all apply rigor to how we measure them, and educate ourselves on how to validate an AVM.
With regard to individual AVMs, accuracy is the measure of the variance between the estimated value and the true value of the property (which can be the sale price or appraised value). Here are our thoughts on choosing the right benchmark. Accuracy is more commonly defined as how much error the AVM produces as compared the benchmarks, in aggregate on a large testing sample.
The actual discussion of accuracy is much more complex. As I mentioned in my previous post, most AVM testers use a few common benchmarks to gauge accuracy and measure the error in different ways. They don’t all tell the same story, and are typically used in conjunction with each other. The best AVMs find a balance between hit rate and accuracy measures, so they can be tailored to the use case (sometimes really strong accuracy is preferred over hit rate, other times the opposite is desired). A good AVM provider can adjust their model to suit both needs depending on the use case.
The absolute value of the percent variance between the model valuation and true value or benchmark. (See also: MAE, MdAE, and true value)
MAE (mean absolute error)
Determined by calculating the percent variance between each AVM and benchmark, taking the absolute value of each, then averaging them over the whole test/sample set of benchmarks. A strong measure if using consistent benchmarks since it accounts for the times the AVM was very wrong and produces a large error. In other words, it accounts for the outlier predictions which are important when judging an AVM for day-to-day use.
MdAE (median absolute error)
Similar to MAE, but since it uses the median (instead of the mean), it hides the times the AVM was really far off; i.e. the outliers. This is the most common measure for consumer-grade (not professional/lending-grade) AVMs since it generally makes an AVM appear more accurate than it is.
P10 (or PPE10)
The percentage of time the AVM is within 10 percent of the benchmark. A standard measure for the AVM industry, but does not capture or consider the AVM predictions that are very far from the benchmark (the spread).
In terms of data, this is an observation that has unusual values as compared to the norm, markedly differing from a measure of central tendency. In terms of AVMs, an outlier is a value prediction that is significantly off from the benchmark, or having an unusually large error.
The number of benchmark addresses the AVM was able to predict on. The hit rate is based on the AVM’s ability to locate the property address, or how much confidence the AVM has on its prediction of value for the property. Often, AVMs do not (and should not) produce results when they don’t have strong confidence in the valuation.
The AVM confidence score is typically based on the standard deviation of the valuation prediction (see also: FSD) that indicates the level to which each of multiple models “agree” with each other for a given property. Contact us to learn about the rigor behind making sure our confidence is highly aligned with our AVM’s accuracy.
Typically defined on the national level, the ratio between the total number of valuations returned by the model and the total number of valuations requested. This metric can be further “decomposed” in coverage ratios for various reasons for which valuations cannot be produced. A typical example is to decipher if the AVM “cannot find the property” or if it “cannot produce a valuation that meets a desired confidence level.” (See also: hit rate)
FSD (forecast standard deviation)
Most AVM confidence scores are based on the calculated FSD produced along with the value prediction. FSD is a statistical measure that scores the likeliness that the valuation is accurate. The FSD can be used to determine a highly probable value range around the property’s value. The FSD is typically based on a measure of the spread or deviation in possible estimates that the model found while attempting to conclude a final value estimate. With this metric on each valuation, a modeling team can then measure how correlated these standard deviations are with actual AVM errors and build a model to predict any future AVM error. The result of this model produces the FSD (forecast standard deviation), which can be communicated in various ways. The most common way is as a decimal value (e.g. 0.07) or as a percentage (e.g. 7 percent). The lower the FSD, the better quality/accuracy of the AVM. This percentage is sometimes turned into a confidence score, by subtracting it from 100 percent, (e.g. an FSD of 0.07 or 7 percent is a confidence score of 93 percent).
Clear Capital has taken the science of predicting our AVM errors to the next level. Since we have found that prediction errors are not normally distributed, we forced ourselves to come up with a better way to predict our errors and provide an FSD metric along with every AVM that accurately reflects the validity of the valuation.
Since every AVM vendor typically builds a proprietary model to produce the FSD statistic, the communication of this measure can vary between AVMs. There is no standardized way to produce an FSD in the valuation or financial services industry. This means a 0.07 FSD from one AVM can have a very different measure of accuracy or confidence from another vendor. This makes it difficult to use multiple AVMs together, or to apply a framework on how to use AVMs in general. Our goal is to educate the lending and financial services industry on this nuance and to provide guidance on ways to solve that problem. More on this in a later post.
Error is the term applied to the measured difference between the true value benchmark and the value estimate reported by the AVM. Error is the fundamental measurement used to evaluate the likely performance of any AVM. It is typically reported as an absolute percentage difference.
With regard to AVMs, methodology is the generic term for the variety of methods that a provider may use in the development of an AVM. Occasionally the word “technology” is used in this circumstance where methodology is intended. Some common approaches are index-based, hedonic, appraisal emulation, or tree-based.
The test sample is the selection of properties/addresses that the AVM tester asks the AVM providers to return with an estimated predicted value. It is the basis for AVM performance testing. The test sample is intended to be sufficiently broad to assure a degree of statistical validity to measure error nationwide. While it is possible to select a “generic” test sample, test samples typically reflect the market preferences of the AVM tester as to geography, type of property, and so on. A good sample of properties to be used as an AVM benchmark should be large, diverse, and recent to prevent bias or gaming.
Or, “the benchmark.” The sale price or market value of a purchase transaction, or the appraised value on a refinance transaction.
A note on address standardization
A key part of obtaining an AVM on any given property is to make sure we know the right property address, so we can locate the property. Address standardization is where the different components of the address are parsed and checked to conform with the typical format of U.S. addresses. Clear Capital uses the Coding Accuracy Support System (CASS) by the United States Postal Service, along with third-party services to parse and locate property addresses. There are several different components to address standardization: parsing, error correction, and standardization.
Address parsing is the process by which individual components of the address are broken up and stored separately to allow better management and quality control. Typically parsing looks at separating these components: street number, pre-direction (e.g., N, S, E, W), street name, post-direction (e.g. NW, SE), street suffix (e.g., street, drive, avenue), and so on.
Error correction fixes problems with misspellings, incorrect city names, incorrect zip codes and so on.
Standardization is the process that ensures that the same name is reported the same way. For example, Florida may be represented by Florida, Fla., FL., and so on. Once standardized, it will always be represented by the “standard” which is the two-character state code: FL (without a period at the end).
With regard to AVMs, why is address standardization so important? A substantial number of “misses” by AVMs are not the result of failures in the model itself, but rather simply a failure to properly identify the address within the AVM’s database. Addresses can be represented in many different ways, and even the slightest variations are confusing to an automated system. Knowing which address standardization technique is used by the AVM provider helps both the lender and the AVM provider ensure the best possible performance. Since that isn’t always the case, Clear Capital invests heavily to be able to find any property address entered into our system.
If you’re looking to add AVMs to your lending/securitization workflow, look for my next post: “How to Conduct a Bulletproof AVM Test.”