How to Conduct a Bulletproof AVM Test

bulletproof avm testing

7 steps to rigorously test an automated valuation model before use in lending or securitization workflows

5 min read

What you’ll find in this post:

• Introduction
• Conducting a bulletproof test
◦ Select the AVM vendors you plan to test ahead of gathering any AVM values
◦ Prepare a large set of benchmarks (sample size matters)
◦ Ask the vendor to provide AVMs on the benchmark
◦ Calculate the error rates on the results
◦ Optional: Compare FSD/confidence score to the accuracy of the AVMs
◦ Compare the AVMs
◦ Choose your candidate vendors
• Conclusion

Introduction

In my first blog post in this three-part series, I covered the benefits and pitfalls of different types of AVM (automated valuation model) benchmarks, including sale price and appraised value. In my second post, I wrote a short glossary that breaks down some of the terms related to AVM testing or validation that we’re commonly asked to define. In this post, I’d like to cover a subject that is commonly overlooked: how to rigorously test an AVM before use in lending or securitization workflows. In other words, this post is about how we can measure accuracy so the AVM can’t “game” the test, or make itself look better than it actually is.

The best way to properly test an AVM is to control your benchmarks and understand what information an AVM might have access to that could skew the results. For example, a set of benchmark property addresses that include recent sale prices will most likely be available to any modern AVM that constantly ingests sale price information and uses it to train the valuation model. In such a case, the tester may not get a feel for how an AVM will behave on properties that have not recently sold. Taking this further, even if the AVM does not yet know the sale price, it’s possible the AVM is using listing prices sourced from MLS (multiple listing service) data to gain insight into how much the property could sell for.

Since AVMs are commonly used in situations where the property has not recently sold or is not currently listed on the market, there must be a way to test real-world, day-to-day use. Rather than relying on the AVM provider to perform an unbiased, out-of-sample test and trust that it was conducted in a completely honest fashion, we suggest conducting your own test.

Here at Clear Capital, we test ClearAVM — our proprietary AVM, and currently the industry’s No. 1 AVM based on third-party tests — in several ways, and have found that the refinance appraisal valuation is the best measure for the quality of an AVM because it proves if the AVM can handle tricky property scenarios. When multiple AVMs are measured against the same set of benchmarks, it’s easier to get a feel for the more professional-grade AVMs versus the more consumer-grade AVMs.

Conducting a bulletproof test

Without further ado, here are the seven steps we suggest to conduct a bulletproof AVM test.

1. Select the AVM vendors you plan to test ahead of gathering any AVM values.

This step ensures all the AVMs will use the same set of benchmarks and deliver results to you in the same time period.

2. Prepare a large set of benchmarks (sample size matters).

Again, the best benchmark for an AVM is a recently completed appraisal on a refinance transaction. Why refi? Check out my first post. Ideally there are more than 1,000 addresses in this set — tests on tens of thousands of properties are common — and they all have effective dates within 90 days. These property addresses should properly represent where you conduct business. If you conduct business nationally, the addresses should be dispersed, but have a proportionate amount in urban and suburban areas (versus highly rural areas).

3. Ask the vendor to provide AVMs on the benchmark.

This can be done in a couple ways. Ideally, you ask them to provide test access to their instant AVM API, so you can gather AVMs on your terms, and limit the possibility for an AVM vendor to tailor or influence the results. Knowing most valuation providers are honest, opting for a bulk spreadsheet match and append is not a bad alternative. Keep the benchmark values (the appraised values) to yourself, and only send the property addresses and a tracking ID for each row. Request that the results — at minimum the AVM values and an FSD — be returned as soon as possible. It’s reasonable to ask for a same-day turn around with most modern AVM vendors.

4. Calculate the error rates on the results.

Once you have the AVMs returned, pull them in next to your benchmark values and run some stats. We like to look at AVM accuracy from several views, but most commonly we measure hit rate, mean absolute error (MAE), and PPE10 (see this post for definitions). For each benchmark, tabulate how often the AVM produced a “hit” — this percentage is your hit rate. For all the hits, calculate the percent variance between the AVM and the benchmark. Take the absolute value of this, average it across the whole set, and you have the MAE. You can take the number of AVMs that were within plus or minus 10 percent of the benchmark on the whole set to produce the PPE10.

5. Optional: Compare FSD/confidence score to the accuracy of the AVMs.

A good AVM has a confidence score highly correlated to the valuations variance to the benchmark value. This can be used to determine if you trust the AVM’s confidence in day-to-day use.

6. Compare the AVMs.

Now that you have error rates on all the AVM vendors, you can compare them. Plot the MAE versus hit rate for all AVM vendors on the same chart. In most cases, you should be looking for the AVM that has a good balance of high hit rate and low MAE (or high PPE10).1

7. Choose your candidate vendors.

By this point, you can narrow down the results to a few leaders. This is when it is important to look at other factors, including product offering fit, model governance, and unbiased error rates against sale prices.

Conclusion

These steps represent a perfect scenario of data availability, and we realize not all consumers of AVMs have the benchmarks or capacity to conduct these tests. Variations of these tests can be conducted as long as it’s understood what the possible pitfalls may be and how the results could be skewed.

Fortunately, there are third-party AVM testers that conduct similar tests. While not all third-party AVM tests are perfect, they do provide a good representation of AVM accuracy when interpreted correctly. To better understand how these tests are conducted and how to get the best results from them, please contact us. We can also connect you with independent professionals to help you run a rigorous test. If you value higher accuracy but less hits (or vice versa), bring it up to your vendor. They should be able to accomodate results that fit your needs.

Subscribe to our newsletter

We’ll keep you in the loop on the latest stories, events, and industry news.