Giovanni Cherubin

If you need a pic of me for events, here it is.

Research interests:

Information leakage estimation for security&privacy
Theory, foundations, and privacy-security properties of Machine Learning
Methods for distribution-free confident prediction in supervised learning and anomaly detection (e.g., Conformal Predictors)

Have a look at a list of recent projects.

I co-founded the CTF team TU6PM.

I am a (happy) OpenBSD and QubesOS user.

news

Oct 9, 2024	In November, we will launch a cross-user prompt injection competition, where participants will have to interact with a simulated LLM-capable email client to run commands on a user’s behalf. Watch this space for updates!
Aug 14, 2024	One can get a closed-form approximation of the risk against membership inference for DP-SGD, and we released an interactive tool that uses this idea to help trimming DP-SGD’s parameters. We can also get data-dependent guarantees for the risk of attribute inference; code for this is available too. Based on our USENIX24 work.
Aug 10, 2022	Our work on evaluating website fingerprinting in the real world was awarded: i) the Internet Defense Award (2nd place) sponsored by Meta, and ii) a Distinguished Paper Award (USENIX ‘22)!
May 25, 2022	Our work on reconstruction attacks against ML models was accepted by IEEE S&P 2022. Check out Jamie’s wonderful presentation!
Feb 7, 2022	I joined Microsoft Research Cambridge and the Microsoft Security Response Centre (conveniently, both acronymise to “MSRC”). I will work as a Senior Researcher on all things ML, privacy-preserving ML, and security.
Nov 30, 2021	Check out our work on deploying&evaluating website fingerprinting attacks on the Tor network. TL;DR: WF is hard for untargeted attacks. To appear in USENIX ‘22.
May 14, 2021	Our paper “Exact Optimization of Conformal Predictors via Incremental and Decremental Learning has been accepted for presentation&publication at ICML ‘21. This work has also been accepted as a spotlight talk at the DFUQ ‘21 ICML workshop.

Publications

Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition Debenedetti, Edoardo, Rando, Javier, Paleka, Daniel, Florin, Silaghi Fineas, Albastroiu, Dragos, Cohen, Niv, Lemberg, Yuval, Ghosh, Reshmi, Wen, Rui, Salem, Ahmed, and others, arXiv preprint arXiv:2406.07954 2024 [Paper]
Are you still on track!? Catching LLM Task Drift with Activations Abdelnabi, Sahar, Fay, Aideen, Cherubin, Giovanni, Salem, Ahmed, Fritz, Mario, and Paverd, Andrew arXiv preprint arXiv:2406.00799 2024 [Paper]
Closed-Form Bounds for DP-SGD against Record-level Inference Attacks Cherubin, Giovanni, Kopf, Boris, Paverd, Andrew, Tople, Shruti, Wutschitz, Lukas, and Zanella-Béguelin, Santiago In 33rd USENIX Security Symposium (USENIX Security 24) 2024 [Paper] [Url]
Bayes Security: A Not So Average Metric Chatzikokolakis, Konstantinos, Cherubin, Giovanni, Palamidessi, Catuscia, and Troncoso, Carmela In 2023 IEEE 36th Computer Security Foundations Symposium (CSF) 2023 [Paper]
Approximating full conformal prediction at scale via influence functions Martinez, Javier Abad, Bhatt, Umang, Weller, Adrian, and Cherubin, Giovanni In Proceedings of the AAAI Conference on Artificial Intelligence 2023 [Paper]
[Short paper] How do the performance of a Conformal Predictor and its underlying algorithm relate? Cherubin, Giovanni In Conformal and Probabilistic Prediction with Applications 2023 [Paper]
SoK: Let the Privacy Games Begin! A Unified Treatment of Data Inference Privacy in Machine Learning Salem, Ahmed, Cherubin, Giovanni, Evans, David, Koepf, Boris, Paverd, Andrew, Suri, Anshuman, Tople, Shruti, and Zanella-Beguelin, Santiago In 2023 IEEE Symposium on Security and Privacy (SP) 2023 [Paper]
[Short paper] How do the performance of a Conformal Predictor and its underlying algorithm relate? Cherubin, Giovanni In Conformal and Probabilistic Prediction with Applications 2023 [Paper]
Disparate vulnerability: On the unfairness of privacy attacks against machine learning Kulynych, Bogdan, Yaghini, Mohammad, Cherubin, M, Veale, G, and Troncoso, C 2022
Online Website Fingerprinting: Evaluating Website Fingerprinting Attacks on Tor in the Real World Cherubin, Giovanni, Jansen, Rob, and Troncoso, Carmela In 31st USENIX Security Symposium (USENIX Security 22) 2022 [Paper] [Url]
Reconstructing Training Data with Informed Adversaries Balle, Borja, Cherubin, Giovanni, and Hayes, Jamie In 2022 IEEE Symposium on Security and Privacy (SP) 2022 [Paper]
Synthetic Data-what, why and how? Jordon, James, Szpruch, Lukasz, Houssiau, Florimond, Bottarelli, Mirko, Cherubin, Giovanni, Maple, Carsten, Cohen, Samuel N, and Weller, Adrian Royal Society 2022 [Paper]
Exact Optimization of Conformal Predictors via Incremental and Decremental Learning Cherubin, Giovanni, Chatzikokolakis, Konstantinos, and Jaggi, Martin In Proceedings of the 38th International Conference on Machine Learning 2021 [Abs] [Paper] [Url]
Conformal Predictors (CP) are wrappers around ML models, providing error guarantees under weak assumptions on the data distribution. They are suitable for a wide range of problems, from classification and regression to anomaly detection. Unfortunately, their very high computational complexity limits their applicability to large datasets. In this work, we show that it is possible to speed up a CP classifier considerably, by studying it in conjunction with the underlying ML method, and by exploiting incremental&decremental learning. For methods such as k-NN, KDE, and kernel LS-SVM, our approach reduces the running time by one order of magnitude, whilst producing exact solutions. With similar ideas, we also achieve a linear speed up for the harder case of bootstrapping. Finally, we extend these techniques to improve upon an optimization of k-NN CP for regression. We evaluate our findings empirically, and discuss when methods are suitable for CP optimization.
(Poster) Fast conformal classification using influence functions Bhatt, Umang, Weller, Adrian, and Cherubin, Giovanni In Proceedings of the Tenth Symposium on Conformal and Probabilistic Prediction and Applications 2021 [Abs] [Paper] [Url]
We use influence functions from robust statistics to speed up full conformal prediction. Traditionally, conformal prediction requires retraining multiple leave-one-out classifiers to calculate p-values for each test point. By using influence functions, we are able to approximate this procedure and to speed up considerably the time complexity.
Reconstructing Training Data with Informed Adversaries Balle, Borja, Cherubin, Giovanni, and Hayes, Jamie In NeurIPS 2021 Workshop Privacy in Machine Learning 2021 [Paper] [Url]
Black-box Security: Measuring Black-box Information Leakage via Machine Learning Cherubin, Giovanni PhD thesis 2019 [PDF]
F-BLEAU: Fast Black-box Leakage Estimation Cherubin, Giovanni, Chatzikokolakis, Konstantinos, and Palamidessi, Catuscia In IEEE Symposium on Security and Privacy (S&P) 2019 [Abs] [Paper] [Video]
We consider the problem of measuring how much a system reveals about its secret inputs. We work under the black-box setting: we assume no prior knowledge of the system’s internals, and we run the system for choices of secrets and measure its leakage from the respective outputs. Our goal is to estimate the Bayes risk, from which one can derive some of the most popular leakage measures (eg, min-entropy, additive, and multiplicative leakage). The state-of-the-art method for estimating these leakage measures is the frequentist paradigm, which approximates the system’s internals by looking at the frequencies of its inputs and outputs. Unfortunately, this does not scale for systems with large output spaces, where it would require too many input-output examples. Consequently, it also cannot be applied to systems with continuous outputs (eg, time side channels, network traffic).
Exchangeability martingales for selecting features in anomaly detection Cherubin, Giovanni, Baldwin, Adrian, and Griffin, Jonathan In Proceedings of the Seventh Workshop on Conformal and Probabilistic Prediction and Applications 2018 [Abs] [Paper] [Url] [Slides] [Code]
We consider the problem of feature selection for unsupervised anomaly detection (AD) in time-series, where only normal examples are available for training. We develop a method based on exchangeability martingales that only keeps features that exhibit the same pattern (i.e., are i.i.d.) under normal conditions of the observed phenomenon. We apply this to the problem of monitoring a Windows service and detecting anomalies it exhibits if compromised; results show that our method: i) strongly improves the AD system’s performance, and ii) it reduces its computational complexity. Furthermore, it gives results that are easy to interpret for analysts, and it potentially increases robustness against AD evasion attacks.
Majority vote ensembles of conformal predictors Cherubin, Giovanni Machine Learning 2018 [Paper] [Url]
Website Fingerprinting Defenses at the Application Layer Cherubin, Giovanni, Hayes, Jamie, and Juarez, Marc Proceedings on Privacy Enhancing Technologies 2017 [Abs] [Paper] [Code]
We propose two application-layer Website Fingerprinting defences: one server-side (ALPaCA), and one client-side (LLaMA). Both are easy to deploy, and indeed we are currently implementing ALPaCA as an Apache and Nginx module (https://github.com/camelids/libalpaca).
Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses Cherubin, Giovanni Proceedings on Privacy Enhancing Technologies 2017 Best student paper [Paper] [Slides] [Code] [Video]
Hidden Markov Models with Confidence Cherubin, Giovanni, and Nouretdinov, Ilia In Conformal and Probabilistic Prediction with Applications - 5th International Symposium, COPA 2016 [Paper] [Slides] [Code]
Conformal Clustering and Its Application to Botnet Traffic Cherubin, Giovanni, Nouretdinov, Ilia, Gammerman, Alexander, Jordaney, Roberto, Wang, Zhi, Papini, Davide, and Cavallaro, Lorenzo In Statistical Learning and Data Sciences (SLDS) 2015 Best student paper [Paper] [Slides]
Bots detection by Conformal Clustering Cherubin, Giovanni MSc thesis, Royal Holloway University of London 2014 [PDF]

Research Visits

Research Engineer, HP Labs Security Lab, Bristol (August-November 2017)
Supervisors: Jonathan Griffin, Adrian Baldwin

Research Visitor, École Polytechnique, Paris (May; November 2017)
Supervisors: Prof. Catuscia Palamidessi, Kostas Chatzikokolakis

Research Intern, Cornell Tech (June-September 2016)
Supervisor: Prof. Thomas Ristenpart

Academic Service

PC chair of the annual conference on conformal prediction, COPA 2020, COPA 2021. Guest editor of the 2022 Annal of Mathematics and Artificial Intelligence special issue on Conformal Prediction. Co-organiser of the PriML workshop 2021 (@NeurIPS). PC member: NeurIPS 2024, ICLR 2024, IEEE S&P 2022-23 and 2024-25, USENIX 2022-24, SatML 2023-24, ACM CCS 2021, IEEE Euro S&P 2021-22, PETS (2019-2021), COPA 2018. I have also been reviewing for various ML&security conferences and journals (e.g., ICML 2022, Machine Learning journal, Neurocomputing, Financial Cryptography). I was notable reviewer for the SatML editions 2023 and 2024.

I was teaching assistant for the courses: Machine Learning and Data Analysis at Royal Holloway University of London (2014-17). I was teaching assistant for the courses on C programming and Linear Algebra and Geometry at University of Pavia (2011-12). In 2023 and 2024, I gave a lecture on Privacy Preserving Machine Learning at the KU Leuven Summer School on Security & Privacy in the Age of AI

Awards

2022, Internet Defense Prize: Prize awarded by USENIX ‘23 and sponsored by Meta: “Online Website Fingerprinting: Evaluating Website Fingerprinting Attacks on Tor in the Real World”
2022, Distinguished Paper: USENIX ‘23: “Online Website Fingerprinting: Evaluating Website Fingerprinting Attacks on Tor in the Real World”
2017, Best Paper: Andreas Pfitzmann Best Student Paper Award at PETS: “Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses”
2017, First place at Capture The Flag (CTF) security challenge organised by NCC Group at the Cambridge2Cambridge event
2015, Best Paper: Best student paper award sponsored by HP at SLDS conference: “Conformal Clustering and Its Application to Botnet Traffic”
2014, Best Finalist: Best MSc in Big Data finalist in memory of Prof. Alexey Chervonenkis (Royal Holloway University of London)

Short bio

Since 2022, I am a Senior Researcher at Microsoft Research (Cambridge) and with the Microsoft Response Centre. Before, I was Research Fellow (Safe&Ethical AI) at the Alan Turing Institute in London (2020-21), and a postdoctoral fellow (2019-21) at EPFL (Switzerland) with an EcoCloud grant, collaborating with Carmela Troncoso at the SPRING lab and Martin Jaggi at the MLO lab. I have a PhD in Machine Learning and Information Security from Royal Holloway University of London with the Centre of Doctoral Training (CDT), where I was supervised by Alex Gammerman and advised by Kenny Paterson. I received an MSc in Machine Learning from Royal Holloway University of London in 2014, and a BSc in Mechatronics and Computer Engineering from University of Pavia in 2013.

My research focuses on the privacy and security properties of machine learning models, as well as the theoretical and empirical study of their information leakage; I am especially interested in what metrics one should use to measure the leakage (i.e., the risk with respect to attacks) in these contexts. Additionally, I work on distribution-free uncertainty estimation for machine learning, such as Conformal Prediction, distribution-free learning, and I have a personal interest in the use of Kolmogorov complexity as the basis for machine learning (e.g., Algorithmic Learning Theory).