Publications
Journal Articles
[17] | Memorization with neural nets: going beyond the worst case S. Dirksen, P. Finke, M. Genzel • J. Mach. Learn. Res., vol. 25, no. 347, pp. 1–38, 2024 • Abstract • BibTeX • Links:@article{dfg24, In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite data set with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We verify our theoretical result with numerical experiments and additionally investigate the effectiveness of the algorithm on MNIST and CIFAR-10. |
[16] | Let's enhance: A deep learning approach to extreme deblurring of text images T. Trippe, M. Genzel, J. Macdonald, M. März • Inverse Probl. Imaging, vol. 17, no. 5, pp. 1041–1068, 2023 • Abstract • BibTeX • Links:@article{tgmm22, This work presents a novel deep-learning-based pipeline for the inverse problem of image deblurring, leveraging augmentation and pre-training with synthetic data. Our results build on our winning submission to the recent Helsinki Deblur Challenge 2021, whose goal was to explore the limits of state-of-the-art deblurring algorithms in a real-world data setting. The task of the challenge was to deblur out-of-focus images of random text, thereby in a downstream task, maximizing an optical-character-recognition-based score function. A key step of our solution is the data-driven estimation of the physical forward model describing the blur process. This enables a stream of synthetic data, generating pairs of ground-truth and blurry images on-the-fly, which is used for an extensive augmentation of the small amount of challenge data provided. The actual deblurring pipeline consists of an approximate inversion of the radial lens distortion (determined by the estimated forward model) and a U-Net architecture, which is trained end-to-end. Our algorithm was the only one passing the hardest challenge level, achieving over 70% character recognition accuracy. Our findings are well in line with the paradigm of data-centric machine learning, and we demonstrate its effectiveness in the context of inverse problems. Apart from a detailed presentation of our methodology, we also analyze the importance of several design choices in a series of ablation studies. The code of our challenge submission is available under https://github.com/theophil-trippe/HDC_TUBerlin_version_1. |
[15] | A Unified Approach to Uniform Signal Recovery From Nonlinear Observations M. Genzel, A. Stollenwerk • Found. Comut. Math., vol. 23, no. 3, pp. 899–972, 2023 • Abstract • BibTeX • Links:@article{gs22, Recent advances in quantized compressed sensing and high-dimensional estimation have shown that signal recovery is even feasible under strong non-linear distortions in the observation process. An important characteristic of associated guarantees is uniformity, i.e., recovery succeeds for an entire class of structured signals with a fixed measurement ensemble. However, despite significant results in various special cases, a general understanding of uniform recovery from non-linear observations is still missing. This paper develops a unified approach to this problem under the assumption of i.i.d. sub-Gaussian measurement vectors. Our main result shows that a simple least-squares estimator with any convex constraint can serve as a universal recovery strategy, which is outlier robust and does not require explicit knowledge of the underlying non-linearity. Based on empirical process theory, a key technical novelty is an approximative increment condition that can be implemented for all common types of non-linear models. This flexibility allows us to apply our approach to a variety of problems in non-linear compressed sensing and high-dimensional statistics, leading to several new and improved guarantees. Each of these applications is accompanied by a conceptually simple and systematic proof, which does not rely on any deeper properties of the observation model. On the other hand, known local stability properties can be incorporated into our framework in a plug-and-play manner, thereby implying near-optimal error bounds. |
[14] | Solving Inverse Problems With Deep Neural Networks - Robustness Included? M. Genzel, J. Macdonald, M. März • IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 1, pp. 1119–1134, 2023 • Abstract • BibTeX • Links:@article{gmm22, In the past five years, deep learning methods have become state-of-the-art in solving various inverse problems. Before such approaches can find application in safety-critical fields, a verification of their reliability appears mandatory. Recent works have pointed out instabilities of deep neural networks for several image reconstruction tasks. In analogy to adversarial attacks in classification, it was shown that slight distortions in the input domain may cause severe artifacts. The present article sheds new light on this concern, by conducting an extensive study of the robustness of deep-learning-based algorithms for solving underdetermined inverse problems. This covers compressed sensing with Gaussian measurements as well as image recovery from Fourier and Radon measurements, including a real-world scenario for magnetic resonance imaging (using the NYU-fastMRI dataset). Our main focus is on computing adversarial perturbations of the measurements that maximize the reconstruction error. A distinctive feature of our approach is the quantitative and qualitative comparison with total-variation minimization, which serves as a provably robust reference method. In contrast to previous findings, our results reveal that standard end-to-end network architectures are not only resilient against statistical noise, but also against adversarial perturbations. All considered networks are trained by common deep learning techniques, without sophisticated defense strategies. |
[13] | The Separation Capacity of Random Neural Networks S. Dirksen, M. Genzel, L. Jacques, A. Stollenwerk • J. Mach. Learn. Res., vol. 23, no. 309, pp. 1–47, 2022 • Abstract • BibTeX • Links:@article{dgjs21, Neural networks with random weights appear in a variety of machine learning applications, most prominently as the initialization of many deep learning algorithms and as a computationally cheap alternative to fully learned neural networks. In the present article we enhance the theoretical understanding of random neural nets by addressing the following data separation problem: under what conditions can a random neural network make two classes X-, X+ ⊂ ℝd (with positive distance) linearly separable? We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. Crucially, the number of required neurons is explicitly linked to geometric properties of the underlying sets X-, X+ and their mutual arrangement. This instance-specific viewpoint allows us to overcome the usual curse of dimensionality (exponential width of the layers) in non-pathological situations where the data carries low-complexity structure. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity (based on a localized version of Gaussian mean width), which leads to sound and informative separation guarantees. We connect our result with related lines of work on approximation, memorization, and generalization. |
[12] | Gradient-Based Learning of Discrete Structured Measurement Operators for Signal Recovery J. Sauder, M. Genzel, P. Jung • IEEE J. Sel. Area. Inf. Theory, vol. 3, no. 3, pp. 481–492, 2022 • Abstract • BibTeX • Links:@article{sgj22, Countless signal processing applications include the reconstruction of signals from few indirect linear measurements. The design of effective measurement operators is typically constrained by the underlying hardware and physics, posing a challenging and often even discrete optimization task. While the potential of gradient-based learning via the unrolling of iterative recovery algorithms has been demonstrated, it has remained unclear how to leverage this technique when the set of admissible measurement operators is structured and discrete. We tackle this problem by combining unrolled optimization with Gumbel reparametrizations, which enable the computation of low-variance gradient estimates of categorical random variables. Our approach is formalized by GLODISMO (Gradient-based Learning of DIscrete Structured Measurement Operators). This novel method is easy-to-implement, computationally efficient, and extendable due to its compatibility with automatic differentiation. We empirically demonstrate the performance and flexibility of GLODISMO in several prototypical signal recovery applications, verifying that the learned measurement matrices outperform conventional designs based on randomization as well as discrete optimization baselines. |
[11] | Generic Error Bounds for the Generalized Lasso with Sub-Exponential Data M. Genzel, Ch. Kipp • Sampl. Theory Signal Process. Data Anal., vol. 20, no. 2, pp. 15, 2022 • Abstract • BibTeX • Links:@article{gk22, This work performs a non-asymptotic analysis of the generalized Lasso under the assumption of sub-exponential data. Our main results continue recent research on the benchmark case of (sub-)Gaussian sample distributions and thereby explore what conclusions are still valid when going beyond. While many statistical features remain unaffected (e.g., consistency and error decay rates), the key difference becomes manifested in how the complexity of the hypothesis set is measured. It turns out that the estimation error can be controlled by means of two complexity parameters that arise naturally from a generic-chaining-based proof strategy. The output model can be non-realizable, while the only requirement for the input vector is a generic concentration inequality of Bernstein-type, which can be implemented for a variety of sub-exponential distributions. This abstract approach allows us to reproduce, unify, and extend previously known guarantees for the generalized Lasso. In particular, we present applications to semi-parametric output models and phase retrieval via the lifted Lasso. Moreover, our findings are discussed in the context of sparse recovery and high-dimensional estimation problems. |
[10] | M. Genzel, M. März, R. Seidel • Inf. Inference, vol. 11, no. 1, pp. 203–250, 2022 • Abstract • BibTeX • Links: @article{gms21, This paper investigates total variation minimization in one spatial dimension for the recovery of gradient-sparse signals from undersampled Gaussian measurements. Recently established bounds for the required sampling rate state that uniform recovery of all s-gradient-sparse signals in ℝn is only possible with m≳sqrt(sn)⋅PolyLog(n) measurements. Such a condition is especially prohibitive for high-dimensional problems, where s is much smaller than n. However, previous empirical findings seem to indicate that the latter sampling rate does not reflect the typical behavior of total variation minimization. Indeed, this work provides a rigorous analysis that breaks the sqrt(sn)-bottleneck for a large class of "natural" signals. The main result shows that non-uniform recovery succeeds with high probability for m≳s⋅PolyLog(n) measurements if the jump discontinuities of the signal vector are sufficiently well separated. In particular, this guarantee allows for signals arising from a discretization of piecewise constant functions defined on an interval. The key ingredient of the proof is a novel upper bound for the associated conic Gaussian mean width, which is based on a signal-dependent, non-dyadic Haar wavelet transform. Furthermore, a natural extension to stable and robust recovery is addressed. |
[9] | L1-Analysis Minimization and Generalized (Co-)Sparsity: When Does Recovery Succeed? M. Genzel, G. Kutyniok, M. März • Appl. Comput. Harmon. Anal., vol. 52, pp. 82–140, 2021 • Abstract • BibTeX • Links:@article{gkm20, This paper investigates the problem of stable signal estimation from undersampled, noisy sub-Gaussian measurements under the assumption of a cosparse model. Based on generalized notions of sparsity, a novel recovery guarantee for the L1-analysis basis pursuit is derived, enabling accurate predictions of its sample complexity. The bounds on the number of required measurements explicitly depend on the Gram matrix of the analysis operator and therefore account for its mutual coherence structure. The presented results defy conventional wisdom which promotes the sparsity of analysis coefficients as the crucial quantity to be studied. In fact, this paradigm breaks down in many situations of interest, for instance, when applying a redundant (multilevel) frame as analysis prior. In contrast, the proposed sampling-rate bounds reliably capture the recovery capability of various practical examples. The proofs are based on establishing a sophisticated upper bound on the conic Gaussian mean width associated with the underlying L1-analysis polytope. |
[8] | EDDIDAT: a graphical user interface for the analysis of energy-dispersive diffraction data D. Apel, M. Genzel, M. Meixner, M. Boin, M. Klaus, Ch. Genzel • J. Appl. Cryst., vol. 53, no. 4, pp. 1130–1137, 2020 • Abstract • BibTeX • Links:@article{ape+20, EDDIDAT is a MATLAB-based graphical user interface for the convenient and versatile analysis of energy-dispersive diffraction data obtained at laboratory and synchrotron sources. The main focus of EDDIDAT up to now has been on the analysis of residual stresses, but it can also be used to prepare measurement data for subsequent phase analysis or analysis of preferred orientation. The program provides access to the depth-resolved analysis of residual stresses at different levels of approximation. Furthermore, the graphic representation of the results also serves for the consideration of microstructural and texture-related properties. The included material database allows for the quick analysis of the most common materials and is easily extendable. The plots and results produced with EDDIDAT can be exported to graphics and text files. EDDIDAT is designed to analyze diffraction data from various energy-dispersive X-ray sources. Hence it is possible to add new sources and implement the device-specific properties into EDDIDAT. The program is freely available to academic users. |
[7] | Robust 1-Bit Compressed Sensing via Hinge Loss Minimization M. Genzel, A. Stollenwerk • Inf. Inference, vol. 9, no. 2, pp. 361–422, 2020 • Abstract • BibTeX • Links:@article{genzel2019hingeloss, This work theoretically studies the problem of estimating a structured high-dimensional signal x0∈ℝn from noisy 1-bit Gaussian measurements. Our recovery approach is based on a simple convex program which uses the hinge loss function as data fidelity term. While such a risk minimization strategy is very natural to learn binary output models, such as in classification, its capacity to estimate a specific signal vector is largely unexplored. A major difficulty is that the hinge loss is just piecewise linear, so that its ‘curvature energy’ is concentrated in a single point. This is substantially different from other popular loss functions considered in signal estimation, e.g. the square or logistic loss, which are at least locally strongly convex. It is therefore somewhat unexpected that we can still prove very similar types of recovery guarantees for the hinge loss estimator, even in the presence of strong noise. More specifically, our non-asymptotic error bounds show that stable and robust reconstruction of x0 can be achieved with the optimal oversampling rate O(m-1/2) in terms of the number of measurements m. Moreover, we permit a wide class of structural assumptions on the ground truth signal, in the sense that x0 can belong to an arbitrary bounded convex set K⊂ℝn. The proofs of our main results rely on some recent advances in statistical learning theory due to Mendelson. In particular, we invoke an adapted version of Mendelson’s small ball method that allows us to establish a quadratic lower bound on the error of the first-order Taylor approximation of the empirical hinge loss function. |
[6] | Recovering Structured Data From Superimposed Non-Linear Measurements M. Genzel, P. Jung • IEEE Trans. Inf. Theory, vol. 66, no. 1, pp. 453–477, 2020 • Abstract • BibTeX • Links:@article{gj20, This work deals with the problem of distributed data acquisition under non-linear communication constraints. More specifically, we consider a model setup where M distributed nodes take individual measurements of an unknown structured source vector x0 ∈ ℝn , communicating their readings simultaneously to a central receiver. Since this procedure involves collisions and is usually imperfect, the receiver measures a superposition of non-linearly distorted signals. In a first step, we will show that an s-sparse vector x can be successfully recovered from O(s · log(2n/s)) of such superimposed measurements, using a traditional Lasso estimator that does not rely on any knowledge about the non-linear corruptions. This direct method however fails to work for several “uncalibrated” system configurations. These blind reconstruction tasks can be easily handled with the ℓ1,2-Group-Lasso, but coming along with an increased sampling rate of O(s · max{M, log(2n/s)}) observations - in fact, the purpose of this lifting strategy is to extend a certain class of bilinear inverse problems to non-linear acquisition. Our two algorithmic approaches are a special instance of a more abstract framework which includes sub-Gaussian measurement designs as well as general (convex) structural constraints. These results are of independent interest for various recovery and learning tasks, as they apply to arbitrary non-linear observation models. Finally, to illustrate the practical scope of our theoretical findings, an application to wireless sensor networks is discussed, which actually serves as the prototypical example of our methodology. |
[5] | M. Genzel • IEEE Trans. Inf. Theory, vol. 63, no. 3, pp. 1601–1619, 2017 • Abstract • BibTeX • Links: @article{gen17, We study the issue of estimating a structured signal x0 ∈ ℝn from non-linear and noisy Gaussian observations. Supposing that x0 is contained in a certain convex subset K ⊂ ℝn, we prove that accurate recovery is already feasible if the number of observations exceeds the effective dimension of K. It will turn out that the possibly unknown non-linearity of our model affects the error rate only by a multiplicative constant. This achievement is based on recent works by Plan and Vershynin, who have suggested to treat the non-linearity rather as noise, which perturbs a linear measurement process. Using the concept of restricted strong convexity, we show that their results for the generalized Lasso can be extended to a fairly large class of convex loss functions. Moreover, we shall allow for the presence of adversarial noise so that even deterministic model inaccuracies can be coped with. These generalizations particularly give further evidence of why many standard estimators perform surprisingly well in practice, although they do not rely on any knowledge of the underlying output rule. To this end, our results provide a unified framework for signal reconstruction in high dimensions, covering various challenges from the fields of compressed sensing, signal processing, and statistical learning. |
[4] | T. Conrad, M. Genzel, N. Cvetkovic, N. Wulkow, A. Leichtle, J. Vybiral, G. Kutyniok, Ch. Schütte • BMC Bioinform., vol. 18, pp. 160, 2017 • Abstract • BibTeX • Links: @article{con+17, Background: High-throughput proteomics techniques, such as mass spectrometry (MS)-based approaches, produce very high-dimensional data-sets. In a clinical setting one is often interested in how mass spectra differ between patients of different classes, for example spectra from healthy patients vs. spectra from patients having a particular disease. Machine learning algorithms are needed to (a) identify these discriminating features and (b) classify unknown spectra based on this feature set. Since the acquired data is usually noisy, the algorithms should be robust against noise and outliers, while the identified feature set should be as small as possible. Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based on the theory of compressed sensing that allows us to identify a minimal discriminating set of features from mass spectrometry data-sets. We show (1) how our method performs on artificial and real-world data-sets, (2) that its performance is competitive with standard (and widely used) algorithms for analyzing proteomics data, and (3) that it is robust against random and systematic noise. We further demonstrate the applicability of our algorithm to two previously published clinical data-sets. |
[3] | M. Meixner, T. Fuss, M. Klaus, M. Genzel, Ch. Genzel • J. Appl. Cryst., vol. 48, no. 5, pp. 1451–1461, 2015 • Abstract • BibTeX • Links: @article{mei+15, The modified stress scanning method [Meixner, Fuss, Klaus & Genzel (2015). J. Appl. Cryst. 48, 1451-1461 ] is experimentally implemented for the analysis of near-surface residual stress depth distributions that are strongly inhomogeneous. The suggested procedure is validated by analyzing the very steep in-plane residual stress depth profile of a shot-peened Al2O3 ceramic specimen and comparing the results with those that were obtained by well established X-ray diffraction-based gradient methods. In addition, the evaluation formalism is adapted to the depth-dependent determination of the residual stresses inside of multilayer thin-film systems. The applicability for this purpose is demonstrated by investigating the residual stress depth distribution within the individual sublayers of a multilayered coating that consists of alternating Al2O3 and TiCN thin films. In this connection, the specific diffraction geometry that was used for the implementation of the stress scanning method at the energy-dispersive materials science beamline EDDI@BESSYII is presented, and experimental issues as well as limitations of the method are discussed. |
[2] | Asymptotic Analysis of Inpainting via Universal Shearlet Systems M. Genzel, G. Kutyniok • SIAM J. Imaging Sci., vol. 7, no. 4, pp. 2301–2339, 2014 • Abstract • BibTeX • Links:@article{gk14, Recently introduced inpainting algorithms using a combination of applied harmonic analysis and compressed sensing have turned out to be very successful. One key ingredient is a carefully chosen representation system which provides (optimally) sparse approximations of the original image. Due to the common assumption that images are typically governed by anisotropic features, directional representation systems have often been utilized. One prominent example of this class are shearlets, which have the additional benefit of allowing faithful implementations. Numerical results show that shearlets significantly outperform wavelets in inpainting tasks. One of those software packages, ShearLab, even offers the flexibility of using a different parameter for each scale, which is not yet covered by shearlet theory. In this paper, we first introduce universal shearlet systems which are associated with an arbitrary scaling sequence, thereby modeling the previously mentioned flexibility. In addition, this novel construction allows for a smooth transition between wavelets and shearlets and therefore enables us to analyze them in a uniform fashion. For a large class of such scaling sequences, we first prove that the associated universal shearlet systems form band-limited Parseval frames for L2(ℝ2) consisting of Schwartz functions. Second, we analyze the inpainting performance of this class of universal shearlet systems within a distributional model situation using an L1-analysis minimization algorithm for reconstruction. Our main result states that, provided that the scaling sequence is comparable to the size of the (scale-dependent) gap, asymptotically perfect inpainting is achieved. |
[1] | D. Apel, M. Klaus, M. Genzel, Ch. Genzel • J. Appl. Cryst., vol. 47, no. 2, pp. 511–526, 2014 • Abstract • BibTeX • Links: @article{ape+14, A method for the evaluation of strongly inhomogeneous residual stress fields in the near-surface region of polycrystalline materials is introduced, which exploits the full information content contained in energy-dispersive (ED) diffraction patterns. The macro-stress-induced diffraction line shifts ΔEѰhkl observed in ED sin2Ψ measurements are described by modeling the residual stress state σij(z) in real space, based on Rietveld's data analysis concept. Therefore, the proposed approach differs substantially from currently used methods for residual stress gradient analysis such as the `universal plot' method, which enable access to the Laplace stress profiles σij(z). With the example of shot-peened samples made of either 100Cr6 steel or Al2O3, it is demonstrated that the simultaneous refinement of all diffraction patterns obtained in a sin2Ψ measurement with hundreds of diffraction lines provides very stable solutions for the residual stress depth profiles. Furthermore, it is shown that the proposed evaluation concept even allows for consideration of the residual stress component σ33(z) in the thickness direction, which is difficult to detect by conventional sin2Ψ analysis. |
Technical Reports
[3] | M. Genzel, J. Macdonald, M. März • arXiv:2106.00280, 2021 • Abstract • BibTeX • Links: @techreport{gmm21a, This report is dedicated to a short motivation and description of our contribution to the AAPM DL-Sparse-View CT Challenge (team name: "robust-and-stable"). The task is to recover breast model phantom images from limited view fanbeam measurements using data-driven reconstruction techniques. The challenge is distinctive in the sense that participants are provided with a collection of ground truth images and their noiseless, subsampled sinograms (as well as the associated limited view filtered backprojection images), but not with the actual forward model. Therefore, our approach first estimates the fanbeam geometry in a data-driven geometric calibration step. In a subsequent two-step procedure, we design an iterative end-to-end network that enables the computation of near-exact solutions. |
[2] | The Mismatch Principle: The Generalized Lasso Under Large Model Uncertainties M. Genzel, G. Kutyniok • arXiv:1808.06329, 2018 • Abstract • BibTeX • Links:@techreport{gk18, We study the estimation capacity of the generalized Lasso, i.e., least squares minimization combined with a (convex) structural constraint. While Lasso-type estimators were originally designed for noisy linear regression problems, it has recently turned out that they are in fact robust against various types of model uncertainties and misspecifications, most notably, non-linearly distorted observation models. This work provides more theoretical evidence for this somewhat astonishing phenomenon. At the heart of our analysis stands the mismatch principle, which is a simple recipe to establish theoretical error bounds for the generalized Lasso. The associated estimation guarantees are of independent interest and are formulated in a fairly general setup, permitting arbitrary sub-Gaussian data, possibly with strongly correlated feature designs; in particular, we do not assume a specific observation model which connects the input and output variables. Although the mismatch principle is conceived based on ideas from statistical learning theory, its actual application area are (high-dimensional) estimation tasks for semi-parametric models. In this context, the benefits of the mismatch principle are demonstrated for a variety of popular problem classes, such as single-index models, generalized linear models, and variable selection. Apart from that, our findings are also relevant to recent advances in quantized and distributed compressed sensing. |
[1] | A Mathematical Framework for Feature Selection from Real-World Data with Non-Linear Observations M. Genzel, G. Kutyniok • arXiv:1608.08852, 2016 • Abstract • BibTeX • Links:@techreport{gk16, In this paper, we study the challenge of feature selection based on a relatively small collection of sample pairs {(xi,yi)}1≤i≤m. The observations yi∈ℝ are thereby supposed to follow a noisy single-index model, depending on a certain set of signal variables. A major difficulty is that these variables usually cannot be observed directly, but rather arise as hidden factors in the actual data vectors xi∈ℝd (feature variables). We will prove that a successful variable selection is still possible in this setup, even when the applied estimator does not have any knowledge of the underlying model parameters and only takes the 'raw' samples {(xi,yi)}1≤i≤m as input. The model assumptions of our results will be fairly general, allowing for non-linear observations, arbitrary convex signal structures as well as strictly convex loss functions. This is particularly appealing for practical purposes, since in many applications, already standard methods, e.g., the Lasso or logistic regression, yield surprisingly good outcomes. Apart from a general discussion of the practical scope of our theoretical findings, we will also derive a rigorous guarantee for a specific real-world problem, namely sparse feature extraction from (proteomics-based) mass spectrometry data. |
Conference & Workshop Articles
[12] | Self-Distilled Representation Learning for Time Series F. Pieper, K. Ditschuneit, M. Genzel, A. Lindt, J. Otterbach • NeurIPS 2023 Workshop on Self-Supervised Learning, 2023 • Abstract • BibTeX • Links:@inproceedings{pdglo23, Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework. The core of our method is a student-teacher scheme that predicts the latent representation of an input time series from masked views of the same time series. This strategy avoids strong modality-specific assumptions and biases typically introduced by the design of contrastive sample pairs. We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets. |
[11] | Curve Your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models J. Siems, K. Ditschuneit, W. Ripken, A. Lindborg, M. Schambach, J. Otterbach, M. Genzel • NeurIPS 2023, 2023 • Abstract • BibTeX • Links:@inproceedings{Siems2023, Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity due to their interpretability, which arises from expressing the target value as a sum of non-linear transformations of the features. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the features - has hitherto been largely overlooked. Here, we demonstrate how concurvity can severly impair the interpretability of GAMs and propose a remedy: a conceptually simple, yet effective regularizer which penalizes pairwise correlations of the non-linearly transformed feature variables. This procedure is applicable to any differentiable additive model, such as Neural Additive Models or NeuralProphet, and enhances interpretability by eliminating ambiguities due to self-canceling feature contributions. We validate the effectiveness of our regularizer in experiments on synthetic as well as real-world datasets for time-series and tabular data. Our experiments show that concurvity in GAMs can be reduced without significantly compromising prediction quality, improving interpretability and reducing variance in the feature importances. |
[10] | Curve your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models J. Siems, K. Ditschuneit, W. Ripken, A. Lindborg, M. Schambach, J. Otterbach, M. Genzel • ICML 2023 Workshop on Interpretable Machine Learning in Healthcare, 2023 • Abstract • BibTeX • Links:@inproceedings{siems23concurvity_icml, Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity, particularly in high-stakes domains such as healthcare. GAMs are favored due to their interpretability, which arises from expressing the target value as a sum of non-linear functions of the predictors. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the predictors - has hitherto been largely overlooked. Here, we demonstrate how concurvity can severly impair the interpretability of GAMs and propose a remedy: a conceptually simple, yet effective regularizer which penalizes pairwise correlations of the non-linearly transformed feature variables. This procedure is applicable to any gradient-based fitting of differentiable additive models, such as Neural Additive Models or NeuralProphet, and enhances interpretability by eliminating ambiguities due to self-canceling feature contributions. We validate the effectiveness of our regularizer in experiments on synthetic as well as real-world datasets for time-series and tabular data. Our experiments show that concurvity in GAMs can be reduced without significantly compromising prediction quality, improving interpretability and reducing variance in the feature importances. |
[9] | Near-Exact Recovery for Tomographic Inverse Problems via Deep Learning M. Genzel, I. Gühring, J. Macdonald, M. März • ICML 2022 (long talk: 118/5630 = 2%), 2022 • Abstract • BibTeX • Links:@inproceedings{ggmm22, This work is concerned with the following fundamental question in scientific machine learning: Can deep-learning-based methods solve noise-free inverse problems to near-perfect accuracy? Positive evidence is provided for the first time, focusing on a prototypical computed tomography (CT) setup. We demonstrate that an iterative end-to-end network scheme enables reconstructions close to numerical precision, comparable to classical compressed sensing strategies. Our results build on our winning submission to the recent AAPM DL-Sparse-View CT Challenge. Its goal was to identify the state-of-the-art in solving the sparse-view CT inverse problem with data-driven techniques. A specific difficulty of the challenge setup was that the precise forward model remained unknown to the participants. Therefore, a key feature of our approach was to initially estimate the unknown fanbeam geometry in a data-driven calibration step. Apart from an in-depth analysis of our methodology, we also demonstrate its state-of-the-art performance on the open-access real-world dataset LoDoPaB CT. |
[8] | Near-Exact Recovery for Sparse-View CT via Data-Driven Methods M. Genzel, I. Gühring, J. Macdonald, M. März • NeurIPS 2021 Workshop on Deep Learning and Inverse Problems, 2021 • Abstract • BibTeX • Links:@inproceedings{ggmm21, This work presents an empirical study on the design and training of iterative neural networks for image reconstruction from tomographic measurements with unknown geometry. It is based on insights gained during our participation in the recent AAPM DL-Sparse-View CT challenge and a further analysis of our winning submission (team name: robust-and-stable) subsequent to the competition period. The goal of the challenge was to identify the state of the art in sparse-view CT with data-driven techniques, thereby addressing a fundamental research question: Can neural-network-based solvers produce near-perfect reconstructions for noise-free data? We answer this in the affirmative by demonstrating that an iterative end-to-end scheme enables the computation of near-perfect solutions on the test set. Remarkably, the fanbeam geometry of the used forward model is completely inferred through a data-driven geometric calibration step. |
[7] | Learning Structured Sparse Matrices for Signal Recovery via Unrolled Optimization J. Sauder, M. Genzel, P. Jung • NeurIPS 2021 Workshop on Deep Learning and Inverse Problems, 2021 • Abstract • BibTeX • Links:@inproceedings{sgj21, Countless signal processing applications include the reconstruction of an unknown signal from very few indirect linear measurements. Because the measurement operator is commonly constrained by the hardware or the physics of the observation process, finding measurement matrices that enable accurate signal recovery poses a challenging discrete optimization task. Meanwhile, recent advances in the field of machine learning have highlighted the effectiveness of gradient-based optimization methods applied to large computational graphs such as those arising naturally when unrolling iterative algorithms for signal recovery. However, it has remained unclear how to leverage this technique when the set of admissible measurement matrices is both discrete and sparse. In this paper, we tackle this problem and propose an efficient and flexible method for learning structured sparse measurement matrices. Our approach uses unrolled optimization in conjunction with Gumbel reparametrizations. We empirically demonstrate the effectiveness of our method in two prototypical compressed sensing situations. |
[6] | M. Genzel, M. März, R. Seidel • Proceedings of the International Traveling Workshop on Interactions Between Low-Complexity Data Models and Sensing Techniques (iTWIST), arXiv.org, 2020 • Abstract • BibTeX • Links: @inproceedings{gms20b, This paper investigates total variation minimization in one spatial dimension for the recovery of gradient-sparse signals from undersampled Gaussian measurements. Recently established bounds for the required sampling rate state that uniform recovery of all s-gradient-sparse signals in ℝn is only possible with m≳sqrt(sn)⋅PolyLog(n) measurements. Such a condition is especially prohibitive for high-dimensional problems, where s is much smaller than n. However, previous empirical findings seem to indicate that the latter sampling rate does not reflect the typical behavior of total variation minimization. Indeed, this work provides a rigorous analysis that breaks the sqrt(sn)-bottleneck for a large class of natural signals. The main result shows that non-uniform recovery succeeds with high probability for m≳s⋅PolyLog(n) measurements if the jump discontinuities of the signal vector are sufficiently well separated. In particular, this guarantee allows for signals arising from a discretization of piecewise constant functions defined on an interval. The present paper serves as a short summary of the main results in our recent work [arXiv:2001.09952]. |
[5] | Robust 1-Bit Compressed Sensing via Hinge Loss Minimization M. Genzel, A. Stollenwerk • Proceedings of the 13th International Conference on Sampling Theory and Applications (SampTA), IEEE, 2019 • Abstract • BibTeX • Links:@inproceedings{gs19, We study the problem of estimating a structured high-dimensional signal x0 ∈ ℝn from noisy 1-bit Gaussian measurements. Our recovery approach is based on a simple convex program which uses the hinge loss function as data fidelity term. While such a risk minimization strategy is typically applied in classification tasks, its capacity to estimate a specific signal vector is largely unexplored. In contrast to other popular loss functions considered in signal estimation, which are at least locally strongly convex, the hinge loss is just piecewise linear, so that its "curvature energy" is concentrated in a single point. It is therefore somewhat unexpected that we can still prove very similar types of recovery guarantees for the hinge loss estimator, even in the presence of strong noise. More specifically, our error bounds show that stable and robust reconstruction of x0 can be achieved with the optimal approximation rate O(m-1/2) in terms of the number of measurements m. Moreover, we permit a wide class of structural assumptions on the ground truth signal, in the sense that x0 can belong to an arbitrary bounded convex set K ⊂ ℝn. For the proofs of our main results we invoke an adapted version of Mendelson's small ball method that allows us to establish a quadratic lower bound on the error of the first order Taylor approximation of the empirical hinge loss function. |
[4] | A New Perspective on the Sample Complexity of the Analysis Basis Pursuit M. Genzel, G. Kutyniok, M. März • 5th International Workshop on Compressed Sensing applied to Radar, Multimodal Sensing, and Imaging (CoSeRa), EURASIP, 2018 • Abstract • BibTeX • Links:@inproceedings{gkm18, In this article, we summarize our findings concerning signal estimation from undersampled Gaussian measurements under the assumption of a cosparse model. Based on generalized notions of sparsity, we obtain a novel recovery guarantee for the L1-analysis basis pursuit, enabling accurate predictions of its sample complexity. The corresponding bounds on the number of required measurements do explicitly depend on the Gram matrix of the analysis operator and therefore particularly account for its mutual coherence structure. Our findings defy conventional wisdom which promotes the sparsity of analysis coefficients as the crucial quantity to study. In fact, this common paradigm breaks down completely in many situations of practical interest, for instance, when applying a redundant (multilevel) frame as analysis prior. A detailed exposition of the results can be found in our original work, which contains proofs for all statements and extensive numerical experiments. |
[3] | Blind Sparse Recovery Using Imperfect Sensor Networks P. Jung, M. Genzel • Proceedings of the 2018 IEEE Statistical Signal Processing Workshop (SSP), pp. 598–602, IEEE, 2018 • Abstract • BibTeX • Links:@inproceedings{gj18, This work investigates blind aggregation of structured high-dimensional data, using a network of imperfect wireless sensor nodes which noncoherently communicate to a central fusion center or mobile data collector. In our setup, there is an unknown subset (of size k) of all M registered autonomous transceiver nodes that sporadically wake up and simultaneously transmit their sensor readings through a shared channel. This procedure does particularly not involve a training phase that would allow for apriori channel predictions. In order to improve the resolvability in this noncoherent random access channel, the nodes perform an additional randomization of their signals. Since the transmission is usually imperfect, e.g., caused by low-quality hardware and unknown channel fading coefficients, the receiver measures a superposition of non-linearly distorted signals with unknown weights. Such a recovery task can be translated into a bilinear compressed sensing problem with rank-one measurements. We present a theoretical result for the Gaussian case which shows that m = O(sk log(2nM/sk)) measurements are sufficient to guarantee recovery of an s-sparse vector in ℝn. Moreover, our error bounds explicitly reflect the impact of the underlying non-linearities. The performance of our approach is also evaluated numerically for a random network generated by a compressible fading and node activity model. |
[2] | Blind sparse recovery from superimposed non-linear sensor measurements M. Genzel, P. Jung • Proceedings of the 12th International Conference on Sampling Theory and Applications (SampTA), pp. 106–110, IEEE, 2017 • Abstract • BibTeX • Links:@inproceedings{gj17, In this work, we study the problem of sparse recovery from superimposed, non-linearly distorted measurements. This challenge is particularly relevant to wireless sensor networks that consist of autonomous and spatially distributed sensor units. Here, each of the M wireless sensors acquires m individual measurements of an s-sparse source vector x0 ∈ ℝn . All devices transmit simultaneously to a central receiver, causing collisions. Since this process is imperfect, e.g., caused by low-quality sensors and the wireless channel, the receiver measures a superposition of corrupted signals. First, we will show that the source vector can be successfully recovered from m = O(s log(2n/s)) coherently communicated measurements via the vanilla Lasso. The more general situation of non-coherent communication can be approximated by a bilinear compressed sensing problem. Even in the non-linear setting, it will turn out that m = O(s · max{M, log(2n/s)}) measurements are already sufficient for reconstruction using the (group) ℓ1,2-Lasso. In particular, as long as M = O(log(2n/s)) sensors are used, there is no substantial increase in performance when building a coherently communicating network. Finally, we shall discuss several practical implications and extensions of our approach. |
[1] | Keynote Lecture: Residual Stress Gradient Analysis by Multiple Diffraction Line Methods Ch. Genzel, D. Apel, M. Klaus, M. Genzel, D. Balzar • Kurz, S. J. B., Mittemeijer, E. J., Scholtes, B. (Ed.): International Conference on Residual Stresses 9 (ICRS 9), pp. 3–18, Trans Tech Publications Ltd., 2013 • Abstract • BibTeX • Links:@inproceedings{gen+13, The paper deals with methods for X-ray stress analysis (XSA), which allow for the evaluation of near surface in-plane residual stress gradients σ||(τ) and σ||(z) in the LAPLACE- and the real space, respectively. Since the ‘robustness’ of residual stress gradient analysis strongly depends on both, the quality of the measured strain data and the number of experimental data points, the discussion aims at those approaches which are based on processing various diffraction lines or even complete diffraction patterns. It is shown that these techniques, which were originally developed for angle-dispersive (AD) diffraction, can be adapted and enhanced for energy-dispersive (ED) diffraction employing high-energy synchrotron radiation. With the example of a shot-peened ferritic steel it is demonstrated, that sin²ψ-data measured in the Ψ-mode of XSA employing the ED diffraction technique can be analyzed on different levels of approximation. |
Other Publications
[3] | Solving Inverse Problems With Deep Neural Networks - Robustness Included? (joint with J. Macdonald and M. März) M. Genzel • Oberwolfach Rep., vol. 18, no. 4, pp. 3043–3046, 2021 • BibTeX • Links:@article{gmm21owr, |
[2] | Artificial Neural Networks M. Genzel, G. Kutyniok • GAMM Rundbrief, vol. 2019, no. 2, pp. 12–18, 2019 • BibTeX • Links:@article{gk19, |
[1] | The Mismatch Principle: An Ignorant Approach to Non-Linear Compressed Sensing? (joint with G. Kutyniok and P. Jung) M. Genzel • Oberwolfach Rep., vol. 15, no. 1, pp. 781–782, 2018 • BibTeX • Links:@article{gkj18, |
Monographs
[3] | The Mismatch Principle and L1-Analysis Compressed Sensing: A Unified Approach to Estimation Under Large Model Uncertainties and Structural Constraints M. Genzel • Ph.D. Thesis, Technische Universität Berlin, 2019 • BibTeX • Links:@phdthesis{gen19, |
[2] | Sparse Proteomics Analysis: Toward a Mathematical Foundation of Feature Selection and Disease Classification M. Genzel • Master's Thesis, Technische Universität Berlin, 2015 • BibTeX@mastersthesis{gen15, |
[1] | Analysis von Inpainting mittels Hybrid-Shearlets und Clustered Sparsity M. Genzel • Bachelor's Thesis, Technische Universität Berlin, 2013, (in german) • BibTeX@mastersthesis{gen13, |