We are only given a small labelling budget $M$ ($< N$), we employ the technique presented in [Yilmaz et.al., 2021] to sample data points which contribute more to any given test statistic. There are two sampling approaches presented in the above paper, currently we focus on using the importance sampling approach. In short, the importance sampling approach samples those points which contribute more to the squared error between the finite $M$ estimator $\hat{F}$ and infinite $M$ limit $F^\prime.$ Clearly, we do not have access to the true underlying posterior distribution, to mitigate this the paper suggests approximations.
In this specific paper [Yilmaz et.al., 2021], the authors consider metrics of the form:
For example, for the accuracy metric we have $f(c_p,c_t)=\mathbb{I}[c_p =c_t], g(c_p,c_t)=1$. Here $\mathbb{I}[x=y]$ is the indicator function. The paper forms the estimator:
$$ \hat{F} = \frac{\hat{x}}{\hat{y}} $$
Here, for a chosen $M$ samples, we define
The importance distribution is given by $q_n, n \in \{ 1, . . . , N \}$. The optimal sampling distribution is then proportional to The squared error between the finite $M$ estimator $\hat{F}$ and infinite $M$ limit $F^\prime.$
$$ \mathbb{E}\left[(\hat{F} - F^\prime)^2\right] \propto \frac{h_n^2}{q_n}\,, $$
where, $h_n² \propto \left \langle \left (\frac{f_n}{g_n} − F^′\right)^2\right \rangle = p(c^t_n =1|x_n)(f(c^p_n,c^t_n =1)−F^′g(c^p_n,c^t_n =1))2 +p(c^t_n=0|x_n)(f(c^p_n,c^t_n=0)−F^′g(c^p_n,c^t_n=0)^2$
. To calculate the optimal sampler, we therefore need to know the true class distribution $p(c^t_n = 1|x_n)$. In other works the assumption $p(c^t_n = 1|x_n) = p(c^p_n = 1|x_n)$ is made. However, this places great faith in the model and can lead to overconfidence, particularly in models with very high or very low probabilities. To address this the authors replace the unknown $p(c^t_n|x_n)$ with an estimate,
$$ p^t_a(c_n = 1|x_n) = \lambda p(c^p_n = 1|x_n) + (1 − \lambda)0.5 \,,
$$
for some user chosen $0 \leq \lambda \leq 1$.
We use this to form an approximation to $F^′$ by computing the expectation with respect to $p_a(c^t_n|x_n)$. We denote this
$$ F_a^′ = \frac{\sum_{n=1}^{N} \mathbb{E}{p_a} [f_n]}{ \sum{n=1}^{N} \mathbb{E}_{p_a} [g_n]} $$
This approximation can be used in place of the true $F^\prime$. This enables us to fully define $h_n$ and the optimal Importance Sampler $q_n \propto |h_n|$.
There is another paper [Kossen et. al., 2021] which aims at estimating a test statistic like accuracy in a sample efficient way in an active setting. The analysis to formulate optimal sampling distribution follows the same approach as described earlier:
$$
q^*(i_m) \\propto \\mathbb{E}_{\\pi(c^t_m | x_{i_m})} [\\mathcal{L}(f(x_{i_m}),y)].
$$
Now, similar to the above scenario we do not have access to the true $p(c^t | x)$ . Therefore, [Kossen et. al., 2021] introduce a general framework for approximating $q^∗(i_m)$ by considering the concept of a surrogate for $c^t|x$, that approximates the true $p(c^t|x)$ using the marginal distribution $\pi(c|x) = \mathbb{E}_{π(θ)} [π(c|x, θ)]$.
How is it active though ? The active part of it is using previously sampled points to update the surrogate function and re-compute the sampling distribution $q(\cdot, \pi)$. See Algorithm 1 on the right.