Prof. Dr. Gerhard Wellein

Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Department of Computer Science
Professorship for High Performance Computing

Room: Room 01.130-113.02
Martensstr. 3
91058 Erlangen

 

Gerhard Wellein is a Professor for High Performance Computing at the Department for Computer Science of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and holds a PhD in theoretical physics from the University of Bayreuth. From 2015 to 2017 he was also a guest lecturer at the Faculty of Informatics at Università della Svizzera italiana (USI) Lugano. Since 2021 he is the director of the Erlangen National Center for High Performance Computing (NHR@FAU). He is a member of the board of directors of the German NHR-Alliance which coordinates the national HPC Tier-2 infrastructures at German universities. He has been serving for many years as the deputy speaker of the Bavarian HPC competence network KONWIHR. As a member of the scientific steering committees of the Leibniz Supercomputing Centre (LRZ) and the Gauss-Centre for Supercomputing (GCS) he is organizing and surveying the compute time application process for national HPC resources.

Gerhard Wellein has more than twenty years of experience in teaching HPC techniques to students and scientists. He has contributed to numerous tutorials on node-level performance engineering in the past decade and received the “2011 Informatics Europe Curriculum Best Practices Award” (together with Jan Treibig and Georg Hager) for outstanding teaching contributions. His research interests focus on performance modelling and performance engineering, architecture-specific code optimization, novel parallelization approaches and hardware-efficient building blocks for sparse linear algebra and stencil solvers. He has been conducting and leading numerous national and international HPC research projects and has authored or co-authored more than 100 peer-reviewed publications.

 

Lectures in the current semester

List of current projects

extracted from CRIS

  • Quelloffene Lösungsansätze für Monitoring und Systemeinstellungen für energieoptimierte Rechenzentren

    (Third Party Funds Single)

    Term: 2022-09-01 - 2025-08-31
    Funding source: Bundesministerium für Bildung und Forschung (BMBF)
    URL: https://eehpc.clustercockpit.org/

    The aim of this project is to reduce power consumption while maximizing throughput in the operation of HPC systems. This is achieved by optimally adjusting system parameters that have an influence on energy consumption to the respective running jobs. To map the throughput of useful work, the Energy Productivity of the IT Equipment metric specified by KPI4DCE is used. The savings potential is demonstrated at all participating data centers for two selected applications each. This project combines a comprehensive job-specific measurement and control infrastructure with machine learning (ML) techniques and software-hardware co-design with the ability to control energy parameters via runtime environments. Policies are used to specify the framework conditions, and the actual optimization of system parameters is automatic and adaptive. To achieve the goals, the GEOPM open-source framework must be extended to include a machine learning component. To make the most of the potential for energy savings, automatic phase detection will be developed, as well as extensions to the MPI and OpenMP runtime environments that allow information about application state to be communicated to the GEOPM framework. To capture required time-resolved metrics on energy consumption as well as performance behavior of the application, interfaces and extensions in LIKWID will be developed. For visualization and control of the GEOPM functionality, the framework is extended to the job-specific Performance Monitoring ClusterCockpit and coupled with GEOPM. The novelty of the solution approach is the development and provision of a product-ready software environment for a fully user-transparent energy optimization of HPC applications. The project builds on existing open source software components and integrates, extends and adapts them for the new requirements.

  • Der skalierbare Strömungsraum

    (Third Party Funds Group – Sub project)

    Overall project: Der skalierbare Strömungsraum
    Term: 2022-09-01 - 2025-08-31
    Funding source: BMBF / Verbundprojekt

    Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätzlich in der Lage sein müssen, die hohe Peak Performance von Beschleuniger-Hardware algorithmisch auszunutzen. Zudem müssen diese Verfahrensansätze in der Anwendersoftware so realisiert werden, dass sie für reale Anwendungen, insbesondere bei der Simulation, Kontrolle und Optimierung von industrierelevanten Prozessen, von “Nicht-HPCExperten” verwendet werden und dabei ressourceneffizient die hohe Leistungsfähigkeit von zukünftigen Exascale-Rechnern ausnutzen können. 

    Die vor allem an der TU Dortmund entwickelte Open Source Software FEATFLOW ist ein leistungsstarkes CFD-Werkzeug und zentraler Teil der StrömungsRaum-Plattform, die von IANUS Simulation seit Jahren erfolgreich im industriellen Umfeld eingesetzt wird. Im Rahmen des Gesamtprojektes soll FEATFLOW methodisch und durch hardwarenahe parallele Implementierungen erweitert werden, so dass hochskalierbare CFD-Simulationen mit FEATFLOW auf zukünftigen Exascale-Architekturen möglich werden.

    Im Teilprojekt der FAU werden Methoden und Prozesse des Performance Engineerings eingesetzt und weiterentwickelt, um zielgerichtet Hardwareeffizienz und Skalierung von FEATFLOW für die kommenden Klassen von HPC-Systemen und abzusehenden Exascale-Architekturen zu verbessern und damit die Simulationszeit stark zu verringern. Dabei werden insbesondere die im Rahmen des Projektes geplanten methodischen Erweiterungen bei der Implementierung effizienter Bibliotheken unterstützt. Darüber hinaus werden Performance Modelle für ausgewählte Kernroutinen erstellt, diese Routinen optimiert und deren effiziente Implementierung in Form von Proxy-Applikationen veröffentlicht.

List of recent publications

extracted from CRIS / see also on Google Scholar

2023

2022

2021

2020

List of talks

2023

2022

Previous projects (2009—2019)

2019

  • Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

    (Third Party Funds Group – Sub project)

    Overall project: Energy Oriented Center of Excellence: toward exascale for energy
    Term: 2019-01-01 - 2021-12-31
    Funding source: Europäische Union (EU)

2017

  • Metaprogrammierung für Beschleunigerarchitekturen

    (Third Party Funds Single)

    Term: 2017-01-01 - 2019-12-31
    Funding source: Bundesministerium für Bildung und Forschung (BMBF)

    In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für
    heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Modelle (RRZE) unterstützt
    Eine Laufzeitumgebung mit eingebautem Performance-Profiling kümmert sich um Resourcenverwaltung und Systemkonfiguration. Das entstandene Framework wird anhand zweier Anwendungen, Ray-Tracing (DFKI) und Bioinformatik (JGU), evaluiert.
    Als Zielplattformen dienen Einzelknoten und Cluster mit mehreren Beschleunigern (CPUs, GPUs, Xeon Phi).

     

    Die Universität Erlangen-Nürnberg ist hauptverantwortlich für die Unterstützung von verteilter
    Programmierung (LSS) sowie für die Entwicklung und Umsetzung von unterstützenden Performance-Modellen sowie einer integrierten Profiling Komponente (RRZE). In beiden Teilbereichen wird zu Beginn eine Anforderungsanalyse durchgeführt um weitere Schritte zu planen und mit den Partnern abzustimmen.
    Der LSS wird im ersten Jahr die Verteilung der Datenstrukturen umsetzen. Im weiteren Verlauf wird sich die Arbeit auf die Umsetzung von Synchronisationsmechanismen konzentrieren. Im letzten Jahr werden Codetransformationen entworfen, um die Konzepte für Verteilung und Synchronisation in AnyDSL auf die gewählten Anwendungen anzupassen. Das RRZE wird in einem ersten Schritt das kerncraft Framework in die partielle Auswertung integrieren. Hierbei wird kerncraft erweitert um aktuelle Beschleunigerarchitekturen sowie Modelle für die Distributed-Memory-Parallelisierung zu unterstützen. In zwei weiteren Paketen wird eine Ressourcenverwaltung und eine auf LIKWID basierende Profiling Komponente umgesetzt

  • Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

    (Third Party Funds Single)

    Term: 2017-01-01 - 2019-12-31
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
    URL: https://blogs.fau.de/prope/

    The ProPE project will deploy a prototype HPC user support
    infrastructure as a distributed cross-site collaborative effort of several
    tier-2/3 centers with complementing HPC expertise. Within ProPE
    code optimizing and parallelization of scientific software is seen as a
    structured, well-defined process with sustainable outcome. The
    central component of ProPE is the improvement, process-based
    implementation, and dissemination of a structured performance
    engineering (PE) process. This PE process defines and drives code
    optimization and parallelization as a target-oriented, structured
    process. Application hot spots are identified first and then
    optimized/parallelized in an iterative cycle: Starting with an analysis of
    the algorithm, the code, and the target hardware a hypothesis of the
    performance-limiting factors is proposed based on performance
    patterns and models. Performance measurements validate or guide
    the iterative adaption of the hypothesis. After validation of the
    hardware bottleneck, appropriate code changes are deployed and the
    PE cycle restarts. The level of detail of the PE process can be
    adapted to the complexity of the underlying problem and the
    experience of the HPC analyst. Currently this process is applied by
    experts and at the prototype level. ProPE will formalize and document
    the PE process and apply it to various scenarios (single core/node
    optimization, distributed parallelization, IO-intensive problems).
    Different abstraction levels of the PE process will be implemented and
    disseminated to HPC analysts and application developers via user
    support projects, teaching activities, and web documentation. The
    integration of the PE process into modern IT infrastructure across
    several centers with different HPC support expertise will be the
    second project focus. All components of the PE process will be
    coordinated and standardized across the partnering sites. This way
    the complete HPC expertise within ProPE can be offered as coherent
    service on a nationwide scale. Ongoing support projects can be
    transferred easily between participating centers. In order to identify
    low-performing applications, characterize application loads, and
    quantify benefits of the PE activities at a system level, ProPE will
    employ a system monitoring infrastructure for HPC clusters. This tool
    will be tailored to the requirements of the PE process and designed
    for easy deployment and usage at tier-2/3 centers. The associated
    ProPE partners will ensure the embedding into the German HPC
    infrastructure and provide basic PE expertise in terms of algorithmic
    choices, perfectly complementing the code optimization and
    parallelization efforts of ProPE.

  • Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

    (Third Party Funds Single)

    Term: 2017-03-01 - 2020-02-29
    Funding source: Bundesministerium für Bildung und Forschung (BMBF)

    Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als auch möglicher Programmtransformationen.
    Die Optimierung der Programmausführung für mehrere nicht-funktionale Ziele (z.B. Laufzeit oder Energieverbrauch) soll auf einer Performance-Modellierung zur Eingrenzung des Suchraums effizienter Programmvarianten aufbauen. Anwendungsunabhängige Methoden und Strategien zur Selbstadaption sollen in einem Autotuning-Navigator gekapselt werden.

    Das Erlanger Teilprojekt beschäftigt sich zunächst mit der modellbasierten Verständnis von Autotuning-Verfahren für reguläre Simulationsalgorithmen am Beispiel verschiedener gängiger Stencilklassen. Dabeisollen mit Hilfe erweiterter Performancemodelle strukturierte Richtlinien und Empfehlungen für den Autotuning-Prozess bzgl. relevanter Code-Transformationen und der Beschränkung des Suchraums für Optimierungsparameter erstellt und für den Autotuning-Navigator exemplarisch aufbereitet werden.
    Der zweite Schwerpunkt der Arbeiten besteht in der Erweiterung bestehender analytischer
    Performancemodelle und Software-Werkzeuge auf neue Rechnerarchitekturen und der Integration in den Autotuning-Navigator. Darüber hinaus betreut der Erlanger Gruppe den Demonstrator für Stencil-Codes.
    Die Gruppe wirkt weiters an der Auslegung des AT-Navigators und der Definition von Schnittstellen mit.
     

2016

  • SPP EXA 1648

    (Third Party Funds Group – Sub project)

    Overall project: SPP EXA 1648
    Term: 2016-01-01 - 2019-12-31
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
  • EXASTEEL II - Bridging Scales for Multiphase Steels

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1648: Software for Exascale Computing
    Term: 2016-01-01 - 2018-12-31
    Funding source: DFG / Schwerpunktprogramm (SPP)
    URL: http://www.numerik.uni-koeln.de/14079.html

    In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

    There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale.

    Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods.

    Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project.

    The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators).

  • Equipping Sparse Solvers for Exascale II (ESSEX-II)

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1648: Software for Exascale Computing
    Term: 2016-01-01 - 2018-12-31
    Funding source: DFG / Schwerpunktprogramm (SPP)
    URL: https://blogs.fau.de/essex/activities

    The ESSEX-II project will use the successful concepts and software
    blueprints developed in ESSEX-I for sparse eigenvalue solvers to
    produce widely usable and scalable software solutions with high
    hardware efficiency for the computer architectures of the upcoming
    decade. All activities are organized along the traditional software
    layers of low-level parallel building blocks (kernels), algorithm
    implementations, and applications. However, the classic abstraction
    boundaries separating these layers are broken in ESSEX-II by
    strongly integrating objectives: scalability, numerical reliability, fault
    tolerance, and holistic performance and power engineering. Driven by
    Moores Law and power dissipation constraints, computer systems will
    become more parallel and heterogeneous even on the node level in
    upcoming years, further increasing overall system parallelism. MPI+X
    programming models can be adapted in flexible ways to the
    underlying hardware structure and are widely expected to be able to
    address the challenges of the massively multi-level parallel
    heterogeneous architectures of the next decade. Consequently, the
    parallel building blocks layer supports MPI+X, with X being a
    combination of node-level programming models able to fully exploit
    hardware heterogeneity, functional parallelism, and data parallelism.
    In addition, facilities for fully asynchronous checkpointing, silent data
    corruption detection and correction, performance assessment,
    performance model validation, and energy measurements will be
    provided. The algorithms layer will leverage the components in the
    building blocks layer to deliver fully heterogeneous, automatically
    fault-tolerant, and state-of-the-art implementations of Jacobi-Davidson
    eigensolvers, the Kernel Polynomial Method (KPM), and Chebyshev
    Time Propagation (ChebTP) that are ready to use for production on
    modern heterogeneous compute nodes with best performance and
    numerical accuracy. Chebyshev filter diagonalization (ChebFD) and a
    Krylov eigensolver complement these implementations, and the
    recent FEAST method will be investigated and further developed for
    improved scalability. The applications layer will deliver scalable
    solutions for conservative (Hermitian) and dissipative (non-Hermitian)
    quantum systems with strong links to optics and biology and to novel
    materials such as graphene and topological insulators. Extending its
    predecessor project, ESSEX-II adopts an additional focus on
    production-grade software. Although the selection of algorithms is
    strictly motivated by quantum physics application scenarios, the
    underlying research directions of algorithmic and hardware efficiency,
    accuracy, and resilience will radiate into many fields of computational
    science. Most importantly, all developments will be accompanied by
    an uncompromising performance engineering process that will
    rigorously expose any discrepancy between expected and observed
    resource efficiency.

  • Bridging scales - from Quantum Mechanics to Continuum Mechanics. A Finite Element approach.

    (Third Party Funds Single)

    Term: 2016-01-01 - 2018-09-30
    Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

    The concurrently coupled Quantum Mechanics (QM) - Continuum Mechanics (CM) approach for electro-elastic problems is considered in this proposal. Despite the fact that efforts have been made to bridge different description of matter, many questions are yet to be answered. First, an efficient Finite Element (FE)-based solution approach to the Kohn-Sham (KS) equations of Density Functional Theory (DFT) will be further developed. The h-adaptivity in the FE-based solution with non-local pseudo-potentials, as well as the mesh transformation during the structural optimization and formulation of the deformation map are the main topics to be studied. It should be noted that until now there exists no open-source implementation of the DFT approach which uses a FE basis and provides hp-refinement capabilities. A FE basis is very attractive in the context of the DFT theory because of its completeness, refinement possibility as well as good polarization properties based on domain decomposition. Second, QM quantities will be related to their CM counterparts (e.g. displacements, deformation gradient, the Piola stress, polarization, etc). This will be achieved using averaging in the Lagrangian configuration. To that end the full control over a FE-based solution of the KS equations is required. The procedure is then to be tested on a representative numerical example - bending of a single wall carbon nanotube. On the CM side, the surface-enhanced continuum theory will be utilized to properly capture surface effects. It should be noted that although several theoretical works exist on this matter, no numerical attempts have been made to check their validity on test examples. Lastly, based on the correspondence between different formulations, a concurrently coupled QM-CM method will be proposed. Coupling will be achieved in a staggered way, i.e. QM and CM problems will be solved iteratively with a proper exchange of information between them. A test-problem of crack propagation in a graphene sheet will be considered. As a long term goal of the project, coupling strategies for electro-elastic problems will be developed. To the best of my knowledge, non of the QM-CM coupling method is capable to handle electro-elastic problems.

  • Ultra-Skalierbare Multiphysiksimulationen für Erstarrungsprozesse in Metallen

    (Third Party Funds Group – Overall project)

    Term: 2016-02-01 - 2019-01-31
    Funding source: BMBF / Verbundprojekt

    Komplexe Phänomene in den Natur- und Ingenieurwissenschaften werden dank der rapide steigenden Rechenleistung immer öfter mit Hilfe von realitätsgetreuen Simulationstechniken erforscht. Das daraus entstandene Fachgebiet Computational Science and Engineering (CSE) gilt deshalb als neue, dritte Säule der Wissenschaft, die die beiden klassischen Säulen Theorie und Experiment ergänzt und verstärkt. Im Kern des CSE geht es darum, leistungsfähige Simulationsmethoden für aktuelle und zukünftige Höchstleistungsrechner zu entwerfen, zu analysieren und sie für die praktische Nutzung robust, benutzerfreundlich und zuverlässig zu implementieren.

    Für die Entwicklung neuer Materialien mit besseren Werkstoffeigenschaften, sowie für die Optimierung von Herstellungs- und Fertigungsprozessen sind moderne und hocheffiziente Simulationstechniken heute unverzichtbar. Sie ersetzen hier zu einem großen Teil die traditionellen zeit- und kostenintensiven Experimente, die sonst für die Materialentwicklung und die Qualitätssteigerung von Werkstoffkomponenten erforderlich sind. Materialsimulationen bilden dabei jedoch eine große Herausforderung für die Grundlagenforschung und für das Höchstleistungsrechnen.

    Die mechanischen Eigenschaften eines Werkstoffes werden ganz wesentlich durch die Ausbildung der Mikrostruktur beim Herstellungsprozess - d.h. bei der Erstarrung aus der Schmelze - festgelegt. Die Simulation des Erstarrungsprozesses kann dabei wichtige neue Erkenntnisse über experimentell nicht beobachtbare Gefügeausbildungsprozesse liefern und dies ermöglicht es, den Einfluss auf die erzielte Struktur systematisch zu analysieren. Hiermit wird es in Zukunft möglich, neue Materialien mit speziellen Eigenschaften virtuell am Computer zu entwerfen.

    Simulationsbasierte Forschungs- und Entwicklungsarbeiten für diese Problemstellung erfordern eine sehr feine räumliche und zeitliche Auflösung, um alle relevanten physikalischen Effekte abzubilden und deshalb benötigen sie eine extrem hohe Rechenleistung. Um auf künftigen Großrechensystemen derartige Probleme mit vielen Tausend Rechenknoten lösen zu können, muss die eingesetzte Simulationssoftware nicht nur in der Lage sein, diese vielen Rechenknoten gleichzeitig zu nutzen, sondern sie muss darüber hinaus auch eine maximale Rechenleistung bei möglichst geringem Ressourcenverbrauch liefern. Neben der eigentlichen Rechenzeit gewinnt hier auch der Energieverbrauch der Supercomputer eine erhebliche Bedeutung. Als Software Basis von SKAMPY wird das waLBerla Framework verwendet. In diesem Projekt wird waLBerla nun erweitert um neue anwendungsorientierte Probleme in den Materialwissenschaften zu lösen. Dabei kommen speziell entwickelte Programmiermethoden zum Einsatz, die eine besonders gute Ausnutzung der Supercomputer ermöglichen. Im Rahmen einer vielversprechenden gemeinsamen Machbarkeitsstudie für die Simulation von Erstarrungsprozessen in Metalllegierungen wurde bereits die Leistungsfähigkeit des Ansatzes und die Portierbarkeit auf die Architekturen aller drei deutschen Höchstleistungsrechner gezeigt, so dass das Projektkonsortium nun bestens aufgestellt ist, um Supercomputersimulationen auch für zukünftige, noch deutlich komplexere Forschungsaufgaben nachhaltig nutzbar zu machen.

2012

  • ESSEX - Equipping Sparse Solvers for Exascale

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1648: Software for Exascale Computing
    Term: 2012-11-01 - 2019-06-30
    Funding source: DFG / Schwerpunktprogramm (SPP)

    The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

  • ESSEX - Equipping Sparse Solvers for Exascale

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1648: Software for Exascale Computing
    Term: 2012-11-01 - 2015-12-31
    Funding source: DFG / Schwerpunktprogramm (SPP)

    The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

  • EXASTEEL - Bridging Scales for Multiphase Steels

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1648: Software for Exascale Computing
    Term: 2012-11-01 - 2015-12-31
    Funding source: DFG / Schwerpunktprogramm (SPP)

    This project adresses algorithms and Software for the Simulation of three dimensional multiscale material science problems on the future Supercomputers developed for exascale computing.The performance of modern high strength steels is governed by the complex interaction of the individual constituents on the microscale. Direct computational homogenization schemes such as the FE2 method allow for the high fidelity material design and analysis of modern steels. Using this approach, fluctuations of the local field equations (balance laws) can be resolved to a high accuracy, which is needed for the prediction of failure of such micro-heterogeneous materials.Performing the scale bridging within the FE2 method for realistic problems in 3D still requires new ultra-scalable, robust algorithms and solvers which have to be developed and incorporated into a new application Software.Such algorithms must be specifically designed to allow the efficient use of the future hardware.Here, the direct multiscale approach (FE2) will be combined with new, highly efficient, parallel solver algorithms. For the latter algorithms, a hybrid algorithmic approach will be taken, combining nonoverlapping parallel domain decomposition (FETl) methods with efficient parallel multigrid preconditioners. A comprehensive performance engineering approach will be implemented guided by the Pl Wellein, to ensure a systematic optimization and parallelization process across all Software layers.This project builds on parallel Simulation Software developed for the solution of complex nonlinear structural mechanics problem by the Pls Schröder, Balzani and Klawonn, Rheinbach. !t is based on the application Software package FEAP (Finite Element Analysis Program, R. Taylor, UC Berkeley). Within a new Software environment FEAP has been combined with a FETI-DP domain decomposition solver, based on PETSc (Argonne National Laboratory) and hypre (Lawrence Livermore National Laboratory), e.g„ to perform parallel simulations in nonlinear biomechanics. The optimization, performance modeling and performance engineering will be guided by the Pl Wellein. The Pls Schröder and Balzani have performed FE2-simulations in the past using an extended version of FEAP.The envisioned scale-bridging for realistic, advanced engineering problems in three dimensions will require a computational power which will only be obtainable when exascale computing becomes available.

  • TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework

    (Third Party Funds Group – Sub project)

    Overall project: SPP 1648: Software for Exascale Computing
    Term: 2012-11-01 - 2015-12-31
    Funding source: DFG / Schwerpunktprogramm (SPP)

    Much of what one refers to as geological activity of the Earth is due to the fact that heat is transported from the interior of our planet to the surface in a planetwide solid-state convection in the Earth’s mantle. For this reason, the study of the dynamics of the mantle is critical to our understanding of how the entire planet works. Processes from earthquakes, plate tectonics, crustal evolution to the geodynamo are governed by convection in the mantle. Without a detailed knowledge of Earth‘s internal dynamic processes, we cannot hope to deduce the many interactions between shallow and deep Earth processes that dominate the Earth system. The vast forces associated with mantle convection cells drive horizontal movement of Earth’s surface in the form of plate tectonics, which is well known albeit poorly understood. They also induce substantial vertical motion in the form of dynamically maintained topography that manifests itself prominently in the geologic record through sea level variations and their profound impact on the ocean and climate system. Linking mantle processes to their surface manifestations is seen widely today as one of the most fundamental problems in the Earth sciences, while being at the same time a matter of direct practical relevance through the evolution of sedimentary basins and their paramount economical importance.Simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. With exascale systems of the future it will be possible to advance beyond the deterministic forward problem to a stochastic uncertainty analysis for the inverse problem. In fact, fluid dynamic inverse theory is now at hand that will allow us to track mantle motion back into the past exploiting the rich constraints available from the geologic record, subject to the availability of powerful geodynamical simulation software that could take advantage of these future supercomputers.The new community code TERRA-NEO will be based on a carefully designed multi-scale spacetime discretization using hybridized Discontinuous Galerkin elements on an icosahedral mesh with block-wise refinement. This advanced finite element technique promises better stability and higher accuracy for the non-linear transport processes in the Earth mantle while requiring less communication in a massively parallel setting. The resulting algebraic systems with finally more than 1012 unknowns per time step will be solved by a new class of communication-avoiding, asynchronous multigrid preconditioners that will achieve maximal scalability and resource-optimized computational performance. A non-deterministic control flow and a lazy evaluation strategy will alleviate the traditional over-synchronization of hierarchical iterative methods and will support advanced resiliency techniques on the algorithmic level.The software framework of TERRA-NEO will be developed specifically for the upcoming heterogeneous exascale computers by using an advanced architecture-aware design process. Special white-box performance models will guide the software development leading to a holistic co-design of the data structures and the algorithms on all levels. With this systematic performance engineering methodology we will also optimize a balanced compromise between minimal energy consumption and shortest run time.This consortium is fully committed to the interdisciplinary collaboration that is necessary for creating TERRA-NEO as new exascale simulation framework. To this end, TERRA-NEO brings top experts together that cover all aspects of CS&E, from modeling via the discretization to solvers and software engineering for exascale architectures.

2011

  • Eine fehlertolerante Umgebung für peta-scale MPI-Löser

    (Third Party Funds Group – Sub project)

    Overall project: FEToL
    Term: 2011-06-01 - 2014-05-31
    Funding source: Bundesministerium für Bildung und Forschung (BMBF)

2009

  • SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen

    (Third Party Funds Group – Overall project)

    Term: 2009-01-01 - 2011-12-31
    Funding source: BMBF / Verbundprojekt, Bundesministerium für Bildung und Forschung (BMBF)

    Ziel des vom BMBF geförderten Projekts SKALB (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist die effiziente Implementierung und Weiterentwicklung von Lattice-Boltzmann basierten Strömungslösern zur Simulation komplexer Multi-Physik-Anwendungen auf Rechnern der Petascale-Klasse. Die Lattice-Boltzmann Methode ist ein akzeptiertes Lösungsverfahren im Bereich der numerischen Strömungsmechanik. Als zentraler Vorteil der Methode ist die prinzipielle Einfachheit des numerischen Verfahrens zu nennen, so dass sich sowohl komplexe Strömungsgeometrien wie poröse Medien oder Metallschäume als auch direkte numerische Simulationen (DNS) zur Untersuchung turbulenter Strömungen effizient berechnen lassen. Im Projekt SKALB sollen Lattice-Boltzmann-Applikationen für die neue Klassen massivst paralleler heterogener und homogener Supercomputer methodisch und technisch weiterentwickelt werden. Das RRZE bringt seine langjährige Erfahrung auf dem Gebiet der Performancemodellierung und effizienten Implementierung von Lattice-Boltzmann-Methoden auf einem breiten Spektrum moderner Rechner ein und beschäftigt sich darüberhinaus mit neuen Programmieransätzen für Multi-/Manycore Prozessoren. Der am RRZE weiterentwickelte Applikationscode soll gemeinsam mit der AG Prof. Schwieger zur massiv parallelen Simulation von Strömungen in porösen Medien eingesetzt werden.

Older publications (2003—2019)

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003