Speakers:
 Prof. Gitta Kutyniok, Ludwig Maximilian University of Munich (web)
 Prof. Aaditya Ramdas, Carnegie Mellon University (web)
 Prof. Johannes SchmidtHieber, University of Twente (web)
 Prof. Bin Yu, University of California, Berkeley (web)
Each of the four speakers will give three lectures (summaries are provided below). A daytoday schedule is below, right before the short abstracts of the individual lectures.
Schedule
Sunday  Monday  Tuesday  Wednesday 
Breakfast 7:009:00 
Breakfast 7:009:00 
Breakfast 7:009:00 

Lecture Prof. Bin Yu 9:00 – 10:30 
Lecture Prof. SchmidtHieber 9:00 – 10:30 
Lecture Prof. Bin Yu 9:00 – 10:30 

Coffee Break  Coffee Break  Coffee Break  
Lecture Prof. Ramdas 11:0012:30 
Lecture Prof. SchmidtHieber – Practical 11:0012:30 
Lecture Prof. Ramdas 11:0012:30 

Lunch 12:30  Lunch 12:30  Lunch 12:30  
Welcome Tea 14:00 – 15:00 
Rest Time  Rest Time  
Lecture Prof. SchmidtHieber 15:0016:30 
Social Event 15:0017:00 
Lecture Prof. Ramdas 16:0017:00 

Checkin  Coffee Break  
Lecture Prof. SchmidtHieber 17:30 – 18:30 
Lecture Prof. Bin Yu 17:30 – 18:30 

Dinner 19:15  Dinner 19:15  Dinner 19:15  
Apero 
Abstracts
Prof. Gitta Kutyniok
 The Mystery of Generalization of Deep Learning
Deep neural networks are today the work horse of artificial intelligence. However, despite the outstanding success of deep neural networks in realworld applications, most of the related research is empirically driven and a comprehensive theoretical foundation is still missing. In addition, deep learning has still severe problems with reliability issues. In this lecture, we will start with an introduction into deep neural networks and discuss, in particular, the key theoretical questions of this area. We will then focus on one of the necessary ingredients of reliability, namely generalization. For this, we will first introduce the key aspects of statistical learning theory as we will need it, followed by a survey of diverse approaches to generalization. Finally, we will solve one part of the problem in the setting of graph neural networks completely.  From Explanations to Limitations of Deep Learning
Explainability can be regarded as one contribution to reliability, or it can be viewed as one possibility to still obtain some understanding, since the theoretical foundation of deep learning does not give a full picture yet. Explainability considers readytouse neural networks, and — roughly speaking — aims to identify those features from the input, which are most crucial for the observed output. In this lecture we will provide an introduction into this exciting area and discuss several types of approaches. However, as we will see, deep learning at present has reliability problems such as adversarial examples, which brings us to discuss fundamental limitations in the second part of the lecture. We will, for instance, see that the fact that deep neural networks are trained on digital hardware imposes severe restrictions to reliability in the sense of computability theory.  Application of Deep Learning to Mathematical Problem Settings
Deep Learning has turned out to be extremely effective for mathematical problem settings, in particular, inverse problems and partial differential equations. While the key questions for inverse problems is on how to optimally combine deep learning with classical approaches, research in numerical analysis of partial differential equations focusses on the amazing ability of deep neural networks to circumvent the curse of dimensionality. In this lecture, we will provide an introduction into those two research directions, showing both theoretical and numerical results.
Prof. Aaditya Ramdas
 Estimating means of bounded random variables by betting
We introduce the principle of “testing by betting” by discussing in detail a simple and classical problem from probability theory and statistics from the 1960s (Hoeffding): given observations from a bounded distribution, how can we estimate its mean? This is a nonparametric estimation problem, for which we present stateoftheart confidence intervals and “confidence sequences”, that were derived from a decidedly gametheoretic perspective. We will introduce “Ville’s inequality”, the central (and only) mathematical inequality that underlies gametheoretic statistics. Time permitting, we will discuss some history, and what exactly Ville established in his seminal 1939 PhD thesis.  Sequential experimental design: the lady tasting tea and universal inference
One of the main ideas in gametheoretic statistics is that we bet against the null, and directly use the resulting “wealth as evidence against the null”. The resulting evidence processes are called “eprocesses” (or evalues) and have many advantages over traditional notions of evidence like pvalues. Primarily, if the evidence does not suffice, one can extend the experiment for free and collect even more evidence while maintaining a strong notion of error control. We will elaborate on these ideas by revisiting Fisher’s classical “lady tasting tea” experiment from 100 years ago from a new, modern lens, and discuss extensions to multiple testing. If time permits, we will give a general methodology for constructing eprocesses: universal inference.  Betting for sampling without replacement: how to audit elections
We discuss a very different, but interesting, application of testing and estimation by betting: how to audit an election (after an election result has been announced). This boils down to answering questions about sampling without replacement, another very well studied and classical problem for which betting provides a powerful new answer. Time permitting, we will end with an overview of other topics in nonparametric statistics and probability theory that were not covered in these three lectures.
Prof. Johannes SchmidtHieber
 Survey on neural network structures and deep learning
There are various types of neural networks that differ in complexity and the data types that can be processed. This lecture provides an overview and surveys the algorithms used to fit deep networks to data. We discuss different ideas that underly the existing approaches for a mathematical theory of deep networks.  Theory for shallow networks
We start with the universal approximation theorem and discuss several proof strategies that provide some insights into functions that can be easily approximated by shallow networks. Based on this, a survey on approximation rates for shallow networks is given. It is shown how this leads to statistical estimation rates. In the lecture, we also discuss methods that fit shallow networks to data.  Statistical theory for deep networks
Why are deep networks better than shallow networks? We provide a survey of the existing ideas in the literature. In particular, we study localization of deep networks and specific functions that can be easily approximated by deep networks. We outline the theory underlying the recent bounds on the estimation risk of deep ReLU networks. In the lecture, we discuss specific properties of the ReLU activation function. Based on this, we show how risk bounds can be obtained for sparsely connected ReLU networks. At the end, we describe important future steps needed for the further development of the statistical theory of deep learning.
Prof. Bin Yu
Veridical data science and interpretable machine learning towards trustworthy AI
 This lecture introduces the predictability computability stability (PCS) framework and documentation that unifies, streamlines and expands on ideas and best practices from statistics and machine learning for the entire data science life cycle.
 This lecture discusses a motivating application of PCS to develop iterative random forests (iRF) that adds appropriate stability to random forests (RF) for discovering predictable and interpretable highorder interactions. iRF is illustrated through interdisciplinary research in genomics and medicine.
 This lecture first introduces a definition of interpretable machine learning through predictive accuracy, descriptive accuracy and relevancy to a human audience and a particular domain problem. Then it discusses methods such as ACD and AWD to interpret deep neural networks towards trustworthiness, in general and in the context of scientific collaborations in cosmology and cell biology.