If Concept Bottlenecks are the Question, are Foundation Models the Answer?

Apr 28, 2025

arXiv 2504.19774 — with Pietro Barbiero, Francesco Giannini, Andrea Passerini, Stefano Teso, Emanuele Marconato.

Background

Concept Bottleneck Models (CBMs) promise a rare combination in modern AI: high predictive performance and interpretability. Rather than being black boxes, they route predictions through a layer of human-understandable concepts. If a CBM classifies a bird species, you can see that it did so because it detected a red breast, a short beak, and a round body — not just because some opaque pattern matched.

The traditional bottleneck has been data: training a CBM properly requires expert-annotated concept labels, which are costly and time-consuming to collect.

The Foundation Model Hypothesis

With the rise of powerful Vision-Language Models (VLMs) like CLIP, a tempting shortcut emerged: what if we just asked the VLM to provide concept supervision? These models have seen enormous amounts of data and seem to "know" a lot about the visual world. Several recent architectures — which we call VLM-CBMs — have adopted this approach.

The question we ask in this paper is simple but important: does VLM supervision actually work as a substitute for expert annotations?

What We Found

We conduct a broad empirical evaluation of state-of-the-art VLM-CBM architectures using multiple quality metrics. The findings are sobering:

  • VLM supervision can substantially diverge from expert annotations, and the degree of divergence is highly task-dependent. For some domains, VLMs are a reasonable proxy; for others, they are not.
  • Most strikingly: concept accuracy and concept quality are not strongly correlated. A model can score well on standard accuracy metrics while using concepts that are misleading, redundant, or simply wrong from a human perspective.

This means that the evaluation benchmarks currently used in the field may be giving an overly optimistic picture of how interpretable these models actually are.

Why This Matters

The appeal of VLM-CBMs is real — they remove the bottleneck of expert annotation and make CBMs much easier to deploy. But interpretability is only useful if the concepts are actually interpretable and faithful. Our results suggest the field needs better evaluation protocols, and that blindly trusting VLM-generated concepts carries hidden risks.

This paper sets the stage for follow-up work (see VH-CBM) that introduces a principled way to correct VLM supervision with minimal human effort.

Nicola Debole