Black box Security
Measuring security via Machine Learning
Last updated: 19th August 2018
In the paper “Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses” (Cherubin, 2017) we introduced a method to quantify the security of defences to Website Fingerprinting (WF) attacks: security bounds for a WF defence can be derived, with respect to a set of stateoftheart features an adversary uses, by computing an estimate of the Bayes error on a dataset of defended network traces; we showed that an estimate based on the Nearest Neighbour classifier works well. We also proposed a security parameter, , which measures how better than random guessing an adversary can achieve.
However, the method is applicable to a much more general problem: suppose to have a blackbox taking some secret input and returning some output accordingly; also, assume that the output , given some value , takes values from some (unknown) probability density function .
The method we introduced allows estimating the smallest error an adversary makes when trying to predict given . Furthermore, the resulting security parameter conveys properties that generalise such bounds to adversaries with prior knowledge.
This page collects FAQs and future developments of this work. It does not go into depth with the specific application of the method to WF, whose details you may find in (Cherubin, 2017) and the respective talk, but it attempts to be a guide for someone looking to apply the same approach to other problems. It is also based on recent work with Catuscia Palamidessi and Kostas Chatzikokolakis, currently under submission (Cherubin, Chatzikokolakis, & Palamidessi, 2019).
I gave a talk summarising some of the contents of this page at the Turing Institute, which you may find here.
The FAQs are preceded by a short intro, which gives a high level idea of the threat model and the method, and guides the reading of the FAQs by providing links to the appropriate sections. The last question answered in the FAQs gives a Python example of how to compute the bounds for some blackbox in practice.
 Prelude
 FAQs
 What is ?
 What is ?
 Any assumptions on the underlying distribution?
 I don’t like asymptotic results, can I prove a rate of convergence?
 What if I (don’t) know the priors?
 What if the adversary has external knowledge?
 The NN classifier is defined for some metric: which one should I use?
 So, is NN, like, the best classifier ever?
 Other ways to estimate ?
 Other approaches?
 Features?
 Relabelling (e.g., Open World scenario in WF attacks)
 When should I use this method?
 Enough theory! How do I use this thing?
 Epilogue
Prelude
Threat model
We consider a finite secret space . The case of infinite and uncountable is currently outside the scope of this FAQ, although similar results to the ones shown here can be proven under those conditions.
We consider a blackbox, , with and ; is a randomised algorithm, which for some input returns an observation according to a probability density . We assume that such density does not change over time (i.e., between queries to the blackbox), but we make no further assumptions on its nature. “To sample” for some secret will mean that we sample an observation according to . Secrets are chosen according to some priors ; again, no assumption over the priors’ distribution is required to formulate security guarantees, and uniform priors can be used instead; if for some application priors are known, one can specify them instead of using uniform ones.
In a training phase, an adversary is given oracle access to the blackbox, which he can sample times for desired labels to obtain the respective observations . In a test phase, a secret is chosen according to the priors, the adversary is given an observation obtained by sampling for secret , and he is asked to make a prediction for the secret. It is useful to remark that examples sampled in this way are independent, and they come from a joint distribution over that is defined by the priors and the probability densities .
The adversary “wins” if . In what follows, we will generally refer to the adversary’s probability of error; that is, for one run of this attack.
Computing security bound and parameter
In the FAQs below, we will indicate that the probability of error of an adversary is lowerbounded by the Bayes error , which we can estimate by using ML methods; we indicate an estimate of with . Furthermore, we derive a security parameter , which depends on and the prior probabilities, and which can be used to quantify the leakage of the blackbox. Follows an intuition of the method.
To measure the smallest probability of error achievable by an adversary and the respective security parameter of a blackbox , one can proceed as follows:
 create a dataset of examples by sampling the blackbox repeatedly for various ; secrets should be chosen either: i) according to the real priors, if they are known, ii) uniformly at random;
 estimate on such dataset (e.g., by using the NN bound by Cover and Hart)
 determine
We then call secure (or private) a blackbox with security parameter .
Depending on the blackbox, the estimated security bound may not converge quickly enough to (e.g., because the observation space is high dimensional and difficult to separate). In this case, one may need to map the original observations into a different space before computing , by using a set of transformations called features; our application of the method to WF required doing this (Cherubin, 2017). If (and, consequently, ) are computed after extracting features , we will talk about secure (or private) blackboxes.
FAQs
What is ?
The Bayes risk is the smallest error that any classifier (even if computationally unbounded) can achieve on a dataset (Cherubin, 2017); in our context, it is the smallest error that any adversary may commit at separating examples coming from the distribution on . In practice, it is generally not possible to know the real Bayes risk, as this would require knowledge of the underlying probability distributions.
We call an estimate of the Bayes risk , and thus an estimate of the smallest error that an adversary can commit.
In the original paper (Cherubin, 2017), we used an estimate that is computed as a function of the Nearest Neighbour (NN) classifier’s error and the number of unique labels as follows:
This estimate comes from a beautiful paper by Cover and Hart (1967), who show that, as the size of the dataset on which we compute the NN error grows, the following is true:
computed this way is actually a lower bound of rather than a proper estimate; this was a conservative choice due to the application. Other ways exist to estimate .
What is ?
The security parameter indicates the advantage that an adversary has w.r.t. random guessing, and it measures the blackbox’s leakage. It is defined as follows:
where is the random guessing error (i.e., the error committed when random guessing according to priors). The parameter takes value for a perfectly private defence (i.e., one that forces an adversary into random guessing), where there may exist an attack achieving error.
is also related to wellknown security and privacy parameters, from which it inherits interesting properties.

corresponds, in the binary case () with uniform priors (hence, ), to , where is the advantage as it is commonly used in Cryptography. In cryptographic proofs, is usually derived analytically from a security game, and then shown to be negligibly small. In our case, it is the result of a measurement ().

is also related to the Multiplicative Leakage , introduced in Quantitative Information Flow (QIF), which measures the leakage of a channel (Braun, Chatzikokolakis, & Palamidessi, 2009); differently from , is defined in terms of an adversary’s success probability rather than his error: . However, has an interesting property which does not satisfy in general: computed for uniform priors is an upper bound on the multiplicative leakage computed for any other set of priors (note that is larger the more the system leaks). This is called the “Bayes capacity theorem”.
Any assumptions on the underlying distribution?
Nope. So long as examples are sampled i.i.d. (which is the case in the game we defined), is a lower bound for any distribution on , and the guarantees we derive on are valid.
Specific estimates may make very weak assumptions on the distribution or the observation space. For example, the NN estimate requires continuity of densities and a separable metric space; both are generally satisfied in realworld applications.
I don’t like asymptotic results, can I prove a rate of convergence?
Unfortunately, no.
Impossibility result. Antos et al. (1999) showed that, for any estimate of the Bayes risk, it is possible to find probability distributions for which such estimate converges arbitrarily slowly to as increases.
Hence, estimating the security of a blackbox via measurements will never give convergence guarantees. This means we can never be completely sure of the performances of a Bayes risk estimate : under no further assumptions on the distribution, no such estimate can be proven to converge at a certain rate. In the paper, we used the following heuristics to determine convergence:
 visually inspect that the trend of becomes stable as increases
 verify that is smaller than the error of other classifiers.
Clearly, if more information was available on the actual priors/distribution, this could be used for estimating more precisely and to obtain convergence rate guarantees.
What if I (don’t) know the priors?
In general, you may want to use a leakage measure for which the “Bayes capacity theorem” (Braun, Chatzikokolakis, & Palamidessi, 2009) holds, which means the leakage measure is minimised (or maximised, depending on the formulation) when computed for uniform priors.
Multiplicative leakage, , is an example of this: if one computes for uniform priors, then they achieve an upper bound on the blackbox’s leakage.
This means that, even if we don’t know the true priors over , we can use uniform priors to obtain strong security guarantees. However, when we actually know the real priors, we can use them instead to get a more tight security estimate.
The same result does not hold for in the general case , although does give these guarantees in a restricted case. This is ongoing research, and more will follow.
What if the adversary has external knowledge?
In some attacks, the adversary can exploit external knowledge. For instance, in some definition of WF attacks, an adversary may link multiple test observations by assuming they correspond to web pages of the same website.
The security parameter is an indication of the leakage of the blackbox itself. While it is clearly impossible to determine security against any adversary with external knowledge (trivially, one such adversary is an adversary who already knows the secret), one can use blackbox measurements as a building block to prove the security of more complex systems.
The NN classifier is defined for some metric: which one should I use?
The lower bound guarantee of the estimate based on the NN classifier holds for any metric on , with the only constraint that the metric space is separable (Cherubin, Chatzikokolakis, & Palamidessi, 2019).
In the original paper (Cherubin, 2017) we experimented with a few metrics, and opted for Euclidean.
So, is NN, like, the best classifier ever?
No.
1) Note that we use a transformation of the actual NN classifier’s error as a security bound. is not guaranteed to converge to .
2) BUT, for example, the error of the kNN classifier, with increasing with according to some requirements (Appendix in (Cherubin, 2017)), does converge asymptotically to . Classifiers satisfying such property are called universally consistent. However, no classifier can guarantee on its performance in the finite sample under no assumptions over the distribution (No Free Lunch theorem), and thus there is no such thing as the “best classifier ever”.
Other ways to estimate ?
Plenty. The error of any universally consistent classifier will converge to asymptotically in the number of training examples .
Examples of universally consistent classifiers:
 kNN classier with the following properties. As : and
 SVM with appropriate choice of kernel.
Other approaches?
Measuring the security of a blackbox means telling something about how hard it is to separate the probability distributions defines on .
Clearly, an estimate of the Bayes risk is not the only strategy. For instance, one may estimate the information leakage of the blackbox by using a statistical test on the distributions (e.g., KolmogorovSmirnov test), or by estimating the distributions directly (e.g., using KDE).
The advantage of our approach (i.e., using a universally consistent classifier to estimate ) is that it outputs a probability of error, whose meaning we argue is more intuitive than a statistic on the distributions’ distance.
Also, note that by using the NN (or kNN) classifier, we are essentially approximating the underlying probability distributions (kNN methods are indeed based on the intuition that the posterior density at some point can be approximated by the posterior of its neighbours).
Features?
Because a Bayes risk estimate may not converge in practice, one may need to map observations into some new space called the feature space to improve the estimate’s convergence.
In fact, while asymptotically the estimate in the original space will never be worse than the estimate in the feature space (see 5.6 in (Cherubin, 2017)), this may be the case in finite sample conditions.
If this is the case, one will need to select good features, and the achieved security (where indicates the mapping from into ) will only hold until a better mapping is found. In the context of WF (Cherubin, 2017), we worked in the feature space, and argued that finding better features is becoming harder.
Relabelling (e.g., Open World scenario in WF attacks)
Remapping the secrets (i.e., “relabelling”) may be useful to adapt a blackbox to different scenarios.
Consider a setting where there are secrets, and then consider the following distinct attack scenarios:
 the adversary has to guess the exact secret
 the adversary needs to output if he believes the secret is , and otherwise.
To model the first case, we will simply consider a blackbox . To model the latter, we can remap the secrets into , so that the secret is if the original secret is , otherwise; then we define a blackbox , and measure its security as usual.
The idea of relabelling allows defining various scenarios for attacks. In the context of WF, we used this to define the Open World scenario (where the adversary needs to predict whether a web page is among a set of monitored web pages or not), as opposed to the Closed World scenario (where the adversary needs to predict the exact web page). Variations of the Open World scenario can be defined similarly.
Remark on evaluating WF in Open World scenario Even though we can determine privacy in an Open World scenario, we recommend against: such metric should be computed under optimal conditions for an adversary (a Closed World scenario) (Cherubin, 2017).
When should I use this method?
There are two strategies for quantifying the security of a blackbox:
 if the internals of the blackbox are known (i.e., its densities , or its function if is not randomised), one can try to derive a concrete proof of its security; this is to be preferred whenever possible (see below);
 if concrete proofs are not viable (e.g., think of side channel attacks), one can measure its security by using an estimate of the Bayes risk , and determine its security as it was presented here.
The latter is discouraged unless necessary, because it is not possible to prove its convergence under relaxed assumptions.
Note that both strategies have been widely explored in Cryptography and QIF. To the best of our knowledge, the extension to infinite spaces was not tackled until now, and we showed it is achieved by any universally consistent algorithm.
Enough theory! How do I use this thing?
Clone the GitHub repo and
cd
into code/
.
Sample your blackbox many times to obtain a dataset of observations and labels . Then:
The estimate will be based on the NN bound. We plan to support other methods in the future.
We suggest plotting the value of eta
as you let increase the size of the
dataset. This will give you an indication regarding the convergence
of the estimate (see (Cherubin, 2017)).
Should you have non uniform priors, set Rg = 1  max(P(y))
.
However, consider
what discussed about priors.
Epilogue
Changes
As there will probably be mistakes/imprecisions, I will welcome feedbacks and corrections (see the homepage for my contact details), which I will report here.
 19/08/2018 The ““Bayes capacity theorem”” (i.e., a leakage measure is minimised/maximised by uniform priors) does not hold in general for , although it does hold for .
References

Quantitative notions of leakage for onetry attacks Electronic Notes in Theoretical Computer Science 2009

Lower bounds for Bayes error estimation IEEE Transactions on Pattern Analysis and Machine Intelligence 1999

Nearest neighbor pattern classification IEEE Transactions on Information Theory 1967