3 things financial statistics taught me about data science

As someone who has worked as a statistician in a multinational financial institution and a data scientist at new-fangled tech companies, it’s been fascinating to compare the best practices in both industries. Here are a few key differences:

High stakes models necessitate a culture of rigorous process monitoring. Some of my ex-colleagues spent up to 50% of their time monitoring credit and risk models. And if something ever went wrong on their watch — may god have mercy on their poor souls! On the other hand, from what I’ve seen, web companies tend to struggle with monitoring or documenting even their key analysis, preferring to invest the time into pushing the next thing out to production. This sort of makes sense so long as the impact of the project or model is not monetarily large—monitoring should probably scale in accordance with the stakes of the game---but generally the knob is at zero for consumer web.

Speedboating: a real phenomenon! Once every few months at the bank we enjoyed lessons learned presentations by grouchy executives who were unfortunate enough to have run a line of business into the ground. One common denominator was speedboating; sort of a Ponzi-inspired ramp schedule for a product. The idea is, if you are selling a lemon that will lose money over the long run, but earn a short run profit for a limited time, you can make your business look great by ramping up faster than the losses come in. Of course, this is unsustainable, so after a few months or years of glory, it will all fall apart. But in the meantime, you look amazing! In the web industry, it’s even better because you often only have one accountability moment in the form of a 7 day A/B test. The cure is to carefully monitor performance over time by ramp group (for web) or vintage splits (for finance).

Model validity grounding/gaming is crucial, but often ignored in the tech world. In big data, spurious correllation is a given, and yet black box models are pretty frequently popped into production systems with minimal statistical validation. As a result, you can get big problems when input data you've never seen before comes into the system and the model flips out. Monitoring systems can help, as can general awareness of the problem.

Of course finance can learn a ton from web too. Might do a sequel post just to keep the spice levels equivalent on both sides, but for now, that’s it.

  • blog