Prof. Dr. Gerhard Wellein

Professorship for High Performance Computing, Head of NHR@FAU

Department of Computer Science
Professorship for High Performance Computing

Room 01.130-113.02
Martensstr. 3
91058 Erlangen

Gerhard Wellein is a Professor for High Performance Computing at the Department for Computer Science of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and holds a PhD in theoretical physics from the University of Bayreuth. From 2015 to 2017, he was also a guest lecturer at the Faculty of Informatics at the Università della Svizzera italiana (USI) Lugano. Since 2024, he has been a Visiting Professor for HPC at the Delft Institute of Applied Mathematics at the Delft University of Technology. He is the director of the Erlangen National Center for High Performance Computing (NHR@FAU) and a member of the board of directors of the German NHR-Alliance which coordinates the national HPC Tier-2 infrastructure at German universities.

Gerhard Wellein has more than twenty years of experience in teaching HPC techniques to students and scientists. He has contributed to numerous tutorials on node-level performance engineering in the past decade and received the “2011 Informatics Europe Curriculum Best Practices Award” (together with Jan Treibig and Georg Hager) for outstanding teaching contributions. His research interests focus on performance modeling and performance engineering, architecture-specific code optimization, novel parallelization approaches, hardware-efficient building blocks for sparse linear algebra, and stencil solvers. He has been conducting and leading numerous national and international HPC research projects and has authored or co-authored more than 100 peer-reviewed publications.

Vorlesung

Hauptseminar

Effiziente numerische Simulation auf multicore-Prozessoren

extracted from CRIS

FOSTERING THE EUROPEAN ENERGY TRANSITION WITH EXASCALE

(Third Party Funds Group – Sub project)

Overall project: FOSTERING THE EUROPEAN ENERGY TRANSITION WITH EXASCALE
Project leader: Gerhard Wellein
Term: 2024-01-01 - 2026-12-31
Acronym: EoCoE-III
Funding source: EU / Cluster 4: Digital, Industry and Space

Abstract

The Energy-oriented Centre of Excellence for exascale HPC applications (EoCoE-III) applies cutting-edge computational methods in its mission to foster the transition to decarbonized energy in Europe. EoCoE-III is anchored both in the High Performance Computing (HPC) community and in the energy field. It will demonstrate the benefit of HPC for the net-zero energy transition for research institutes and also for key industry in the energy sector. The present project will draw the experience of two successful previous projects EoCoE-I and EoCoE-II, where a set of diverse computer applications from four energy domains achieved significant efficiency gains thanks to its multidisciplinary expertise in applied mathematics and supercomputing. During this 3rd round, EoCoE-III will channel its efforts into 5 exascale lighthouse applications covering the key domains of Energy Materials, Water, Wind and Fusion. A world-class consortium of 18 complementary partners from 6 countries will form a unique network of expertise in energy science, scientific computing and HPC, including 3 leading European supercomputing centres. This multidisciplinary effort will harness innovations in computer science and mathematical algorithms within a tightly integrated co-design approach to overcome performance bottlenecks, to deploy the lighthouse applications on the coming European exascale infrastructure and to anticipate future HPC hardware developments. New modelling capabilities will be created at unprecedented scale, demonstrating the potential benefits to the energy industry, such as accelerated design of photovoltaic devices, high-resolution wind farm modelling over complex terrains and quantitative understanding of plasma core-edge interactions in ITER-scale tokamaks. These lighthouse applications will provide a high-visibility platform for high-performance computational energy science, cross-fertilized through close working connections to the EERA consortium.

→More information
Quelloffene Lösungsansätze für Monitoring und Systemeinstellungen für energieoptimierte Rechenzentren

(Third Party Funds Single)

Project leader: Gerhard Wellein, Jan Eitzinger
Term: 2022-09-01 - 2025-08-31
Acronym: Green-IT EE-HPC
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)
URL: https://eehpc.clustercockpit.org/

Abstract

The aim of this project is to reduce power consumption while maximizing throughput in the operation of HPC systems. This is achieved by optimally adjusting system parameters that have an influence on energy consumption to the respective running jobs. To map the throughput of useful work, the Energy Productivity of the IT Equipment metric specified by KPI4DCE is used. The savings potential is demonstrated at all participating data centers for two selected applications each. This project combines a comprehensive job-specific measurement and control infrastructure with machine learning (ML) techniques and software-hardware co-design with the ability to control energy parameters via runtime environments. Policies are used to specify the framework conditions, and the actual optimization of system parameters is automatic and adaptive. To achieve the goals, the GEOPM open-source framework must be extended to include a machine learning component. To make the most of the potential for energy savings, automatic phase detection will be developed, as well as extensions to the MPI and OpenMP runtime environments that allow information about application state to be communicated to the GEOPM framework. To capture required time-resolved metrics on energy consumption as well as performance behavior of the application, interfaces and extensions in LIKWID will be developed. For visualization and control of the GEOPM functionality, the framework is extended to the job-specific Performance Monitoring ClusterCockpit and coupled with GEOPM. The novelty of the solution approach is the development and provision of a product-ready software environment for a fully user-transparent energy optimization of HPC applications. The project builds on existing open source software components and integrates, extends and adapts them for the new requirements.

→More information
Der skalierbare Strömungsraum

(Third Party Funds Group – Sub project)

Overall project: Der skalierbare Strömungsraum
Project leader: Gerhard Wellein
Term: 2022-09-01 - 2025-08-31
Acronym: StroemungsRaum
Funding source: BMFTR / Verbundprojekt

Abstract

Kommende Exascale-Rechnerarchitekturen werden sich durch eine sehr hohe Zahl an heterogenen Hardware-Komponenten auszeichnen, die auch Spezialprozessoren bzw. Beschleuniger beinhalten werden. Die entsprechende Realisierung von CFD-Anwendersoftware als zentrale Kernkomponente von heutigen Strömungssimulationen im industriellen Umfeld erfordert auf methodischer Seite hochskalierbare Verfahren, vor allem zum Lösen der hochdimensionalen und instationären (nicht)linearen Gleichungssysteme, die zusätzlich in der Lage sein müssen, die hohe Peak Performance von Beschleuniger-Hardware algorithmisch auszunutzen. Zudem müssen diese Verfahrensansätze in der Anwendersoftware so realisiert werden, dass sie für reale Anwendungen, insbesondere bei der Simulation, Kontrolle und Optimierung von industrierelevanten Prozessen, von “Nicht-HPC-Experten” verwendet werden und dabei ressourceneffizient die hohe Leistungsfähigkeit von zukünftigen Exascale-Rechnern ausnutzen können.

Die vor allem an der TU Dortmund entwickelte Open Source Software FEATFLOW ist ein leistungsstarkes CFD-Werkzeug und zentraler Teil der StrömungsRaum-Plattform, die von IANUS Simulation seit Jahren erfolgreich im industriellen Umfeld eingesetzt wird. Im Rahmen des Gesamtprojektes soll FEATFLOW methodisch und durch hardwarenahe parallele Implementierungen erweitert werden, so dass hochskalierbare CFD-Simulationen mit FEATFLOW auf zukünftigen Exascale-Architekturen möglich werden.

Im Teilprojekt der FAU werden Methoden und Prozesse des Performance Engineerings eingesetzt und weiterentwickelt, um zielgerichtet Hardwareeffizienz und Skalierung von FEATFLOW für die kommenden Klassen von HPC-Systemen und abzusehenden Exascale-Architekturen zu verbessern und damit die Simulationszeit stark zu verringern. Dabei werden insbesondere die im Rahmen des Projektes geplanten methodischen Erweiterungen bei der Implementierung effizienter Bibliotheken unterstützt. Darüber hinaus werden Performance Modelle für ausgewählte Kernroutinen erstellt, diese Routinen optimiert und deren effiziente Implementierung in Form von Proxy-Applikationen veröffentlicht.

→More information
DatenREduktion für Exascale- Anwendungen in der Fusionsforschung

(Third Party Funds Group – Sub project)

Overall project: DatenREduktion für Exascale- Anwendungen in der Fusionsforschung
Project leader: Gerhard Wellein
Term: 2022-09-01 - 2025-08-31
Acronym: DaREXA-F
Funding source: BMFTR / Verbundprojekt

→More information
Weiterentwicklung des Hochleistungsrechnens

(Third Party Funds Single)

Project leader: Gerhard Wellein
Term: 2022-01-01 - 2022-12-31
Acronym: NHR22-04–11-ER
Funding source: andere Förderorganisation

→More information

extracted from CRIS / see also on Google Scholar

2026

Afzal A., Hager G., Wellein G.:
Exploring metrics for analyzing dynamic behavior in MPI programs via a coupled-oscillator model
In: Parallel Computing (2026)
ISSN: 0167-8191
DOI: 10.1016/j.parco.2026.103184
URL: https://www.sciencedirect.com/science/article/abs/pii/S0167819126000025

2025

Afzal A., Hager G., Wellein G.:
GROMACS Unplugged: How Power Capping and Frequency Shapes Performance on GPUs
31st International European Conference on Parallel and Distributed Computing (Euro-Par 2025) (Dresden, Germany, 2025-08-25 - 2025-08-29)
In: Euro-Par 2025: Parallel Processing Workshops Volume in the Springer Lecture Notes in Computer Science (LNCS) series. 2025
DOI: 10.48550/arXiv.2510.06902
Lacey D., Alappat C., Lange F., Hager G., Fehske H., Wellein G.:
Cache blocking of distributed-memory parallel matrix power kernels
In: International Journal of High Performance Computing Applications 39 (2025), p. 385-404
ISSN: 1094-3420
DOI: 10.1177/10943420251319332
Wind S., Sopa J., Truhn D., Lotfinia M., Nguyen TT., Bressem K., Adams L., Rusu M., Köstler H., Wellein G., Maier A., Tayebi Arasteh S.:
Multi-step retrieval and reasoning improves radiology question answering with large language models
In: npj Digital Medicine 8 (2025), Article No.: 790
ISSN: 2398-6352
DOI: 10.1038/s41746-025-02250-5
URL: https://www.nature.com/articles/s41746-025-02250-5

2024

Alappat C., Thies J., Hager G., Fehske H., Wellein G.:
Algebraic temporal blocking for sparse iterative solvers on multi-core CPUs
In: International Journal of High Performance Computing Applications (2024)
ISSN: 1094-3420
DOI: 10.1177/10943420241283828
Lange F., Wellein G., Fehske H.:
Charge-order melting in the one-dimensional Edwards model
In: Physical Review Research 6 (2024), Article No.: L022007
ISSN: 2643-1564
DOI: 10.1103/PhysRevResearch.6.L022007
Laukemann J., Gruber T., Hager G., Oryspayev D., Wellein G.:
CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate Evasion
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, CA, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00038
Laukemann J., Hager G., Wellein G.:
Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa
SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis (Atlanta, 2024-11-17 - 2024-11-22)
In: SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, New York City: 2024
DOI: 10.1109/SCW63240.2024.00181
Owen H., Ernst D., Gruber T., Lemkuhl O., Houzeaux G., Gasparino L., Wellein G.:
Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs
38th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 (San Francisco, 2024-05-27 - 2024-05-31)
In: 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2024
DOI: 10.1109/IPDPS57955.2024.00043

2023

Afzal A., Hager G., Markidis S., Wellein G.:
Making applications faster by asynchronous execution: Slowing down processes or relaxing MPI collectives
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.017
Afzal A., Hager G., Wellein G.:
Physical Oscillator Model for Supercomputing
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3625535
Afzal A., Hager G., Wellein G.:
SPEChpc 2021 Benchmarks on Ice Lake and Sapphire Rapids Infiniband Clusters: A Performance and Energy Case Study
14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) (Denver, CO, USA, 2023-11-12 - 2023-11-17)
In: 14th IEEE/ACM Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS23) 2023
DOI: 10.1145/3624062.3624197
Afzal A., Hager G., Wellein G., Markidis S.:
Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022 (Gdansk, Poland, 2022-09-11 - 2023-06-14)
In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (ed.): Lecture Notes in Computer Science 2023
DOI: 10.1007/978-3-031-30442-2_12
Ernst D., Holzer M., Hager G., Knorr M., Wellein G.:
Analytical performance estimation during code generation on modern GPUs
In: Journal of Parallel and Distributed Computing 173 (2023), p. 152-167
ISSN: 0743-7315
DOI: 10.1016/j.jpdc.2022.11.003
Oikonomou E., Gruber T., Achanta RC., Höller S., Alzheimer C., Wellein G., Huth T.:
2D-dwell-time analysis with simulations of ion-channel gating using high-performance computing.
In: Biophysical Journal (2023)
ISSN: 0006-3495
DOI: 10.1016/j.bpj.2023.02.023
Ravedutti Lucio Machado R., Eitzinger J., Köstler H., Wellein G.:
MD-Bench: A Generic Proxy-App Toolbox for State-of-the-Art Molecular Dynamics Algorithms
In: Parallel Processing and Applied Mathematics. PPAM 2022., Springer, Cham, 2023, p. 321-332 (Lecture Notes in Computer Science (LNCS), Vol.13826)
ISBN: 978-3-031-30441-5
DOI: 10.1007/978-3-031-30442-2_24
Ravedutti Lucio Machado R., Eitzinger J., Laukemann J., Hager G., Köstler H., Wellein G.:
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications (2023)
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023
Ravedutti Lucio Machado R., Eitzinger J., Laukemann J., Hager G., Köstler H., Wellein G.:
MD-Bench: A performance-focused prototyping harness for state-of-the-art short-range molecular dynamics algorithms
In: Future Generation Computer Systems-The International Journal of Grid Computing Theory Methods and Applications 149 (2023), p. 25-38
ISSN: 0167-739X
DOI: 10.1016/j.future.2023.06.023

2022

Afzal A., Hager G., Wellein G.:
Addressing White-box Modeling and Simulation Challenges in Parallel Computing
ACM SIGSIM-PADS '22 (GA, Atlanta, USA, 2022-06-08 - 2022-06-10)
In: SIGSIM-PADS '22: SIGSIM Conference on Principles of Advanced Discrete Simulation 2022
DOI: 10.1145/3518997.3534986
Afzal A., Hager G., Wellein G.:
Analytic performance model for parallel overlapping memory-bound kernels
In: Concurrency and Computation-Practice & Experience (2022)
ISSN: 1532-0626
DOI: 10.1002/cpe.6816
URL: https://onlinelibrary.wiley.com/doi/10.1002/cpe.6816
Afzal A., Hager G., Wellein G.:
The Role of Idle Waves, Desynchronization, and Bottleneck Evasion in the Performance of Parallel Programs
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-16
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3221085
Alappat C., Hager G., Schenk O., Wellein G.:
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication
In: IEEE Transactions on Parallel and Distributed Systems (2022), p. 1-18
ISSN: 1045-9219
DOI: 10.1109/TPDS.2022.3223512

2021

Afzal A., Hager G., Wellein G.:
Analytic Modeling of Idle Waves in Parallel Programs: Communication, Cluster Topology, and Noise Impact
36th International Conference on High Performance Computing, ISC High Performance 2021 (Virtual, Online, 2021-06-24 - 2021-07-02)
In: Bradford L. Chamberlain, Bradford L. Chamberlain, Ana-Lucia Varbanescu, Hatem Ltaief, Piotr Luszczek (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2021
DOI: 10.1007/978-3-030-78713-4_19
Alappat C., Meyer N., Laukemann J., Gruber T., Hager G., Wellein G., Wettig T.:
Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX
In: Concurrency and Computation-Practice & Experience (2021)
ISSN: 1532-0626
DOI: 10.1002/cpe.6512
URL: https://onlinelibrary.wiley.com/doi/full/10.1002/cpe.6512
Alappat C., Seiferth J., Hager G., Korch M., Rauber T., Wellein G.:
YaskSite: Stencil Optimization Techniques Applied to Explicit ODE Methods on Modern Architectures
19th IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2021 (Virtual, Korea, KOR, 2021-02-27 - 2021-03-03)
In: Jae W. Lee, Mary Lou Soffa, Ayal Zaks (ed.): CGO 2021 - Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization 2021
DOI: 10.1109/CGO51591.2021.9370316
Ernst D., Hager G., Holzer M., Knorr M., Wellein G.:
Opening the Black Box: Performance Estimation during Code Generation for GPUs
IEEE 33rd International Symposium on Computer Architecture and High Performance Computing (Belo Horizonte – Brazil, 2021-10-26 - 2021-10-28)
DOI: 10.1109/sbac-pad53543.2021.00014
Filusch A., Bishop AR., Saxena A., Wellein G., Fehske H.:
Valley filtering in strain-induced α- T3 quantum dots
In: Physical Review B 103 (2021), Article No.: 165114
ISSN: 2469-9950
DOI: 10.1103/PhysRevB.103.165114
Pasadakis D., Alappat C., Schenk O., Wellein G.:
Multiway p-spectral graph cuts on Grassmann manifolds
In: Machine Learning (2021)
ISSN: 0885-6125
DOI: 10.1007/s10994-021-06108-1

2020

Afzal A., Hager G., Wellein G.:
Desynchronization and Wave Pattern Formation in MPI-Parallel and Hybrid Memory-Bound Programs
35th International Conference on High Performance Computing, ISC High Performance 2020 (Frankfurt, 2020-06-22 - 2020-06-25)
In: Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020
DOI: 10.1007/978-3-030-50743-5_20
Alappat C., Basermann A., Bishop AR., Fehske H., Hager G., Schenk O., Thies J., Wellein G.:
A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication
In: ACM Transactions on Parallel Computing 7 (2020), Article No.: 19
ISSN: 2329-4949
DOI: 10.1145/3399732
Alappat C., Hofmann J., Hager G., Fehske H., Bishop AR., Wellein G.:
Understanding HPC benchmark performance on intel broadwell and cascade lake processors
35th International Conference on High Performance Computing, ISC High Performance 2020 (Frankfurt, 2020-06-22 - 2020-06-25)
In: Ponnuswamy Sadayappan, Bradford L. Chamberlain, Guido Juckeland, Hatem Ltaief (ed.): Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 2020
DOI: 10.1007/978-3-030-50743-5_21
Alappat C., Laukemann J., Gruber T., Hager G., Wellein G., Meyer N., Wettig T.:
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX
2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2020 (, 2020-11-12)
In: Proceedings of PMBS 2020: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems 2020
DOI: 10.1109/PMBS51919.2020.00006
Cremonesi F., Hager G., Wellein G., Schuermann F.:
Analytic performance modeling and analysis of detailed neuron simulations
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020912528
Ernst D., Hager G., Thies J., Wellein G.:
Performance engineering for a tall & skinny matrix multiplication kernels on GPUs
13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 (Bialystok, Poland, 2019-09-08 - 2019-09-11)
In: Lecture Notes in Computer Science book series (LNCS, volume 12043), Cham: 2020
DOI: 10.1007/978-3-030-43229-4_43
Ernst D., Hager G., Thies J., Wellein G.:
Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020965661
Hofmann J., Alappat C., Hager G., Fey D., Wellein G.:
Bridging the architecture gap: Abstracting performance-relevant properties of modern server processors
In: Supercomputing Frontiers and Innovations 7 (2020), p. 54-78
ISSN: 2409-6008
DOI: 10.14529/jsfi200204
Klawonn A., Lanser M., Rheinbach O., Wellein G., Wittmann M.:
Energy efficiency of nonlinear domain decomposition methods
In: International Journal of High Performance Computing Applications (2020)
ISSN: 1094-3420
DOI: 10.1177/1094342020953891
Klawonn A., Lanser M., Uran M., Rheinbach O., Köhler S., Schröder J., Scheunemann L., Brands D., Balzani D., Gandhi A., Wellein G., Wittmann M., Schenk O., Janalík R.:
Exasteel: Towards a virtual laboratory for the multiscale simulation of dual-phase steel using high-performance computing
In: Lecture Notes in Computational Science and Engineering, Springer, 2020, p. 351-404 (Lecture Notes in Computational Science and Engineering, Vol.136)
DOI: 10.1007/978-3-030-47956-5_13
Thies J., Röhrig-Zöllner M., Overmars N., Basermann A., Ernst D., Hager G., Wellein G.:
PHIST: A Pipelined, Hybrid-Parallel Iterative Solver Toolkit
In: Acm Transactions on Mathematical Software 46 (2020), Article No.: 3402227
ISSN: 0098-3500
DOI: 10.1145/3402227

2024

Performance Engineering for High Performance Computing
(Speech / Talk)
2024-11-13, Event: CNLS Seminar, Los Alamos National Laboratory

2023

Performance Engineering for Sparse Matrix-Vector Multiplication: Some new ideas for old problems
(Speech / Talk)
2023-09-28, Event: 2023 Woudschoten Conference
Application Knowledge Required: Analytical Performance Modeling and its application to SpMV
(Speech / Talk)
2023-09-27, Event: 2023 Woudschoten Conference
Performance modelling facing disruptive technologies on the horizon
(Speech / Talk)
2023-08-10, Event: ModSim 2023 - Workshop on Modeling & Simulation of Systems and Applications
The National High-Performance Computing Alliance and NHR@FAU: New Structures and Opportunities
(Speech / Talk)
2023-08-03, Event: Seminar, University of Victoria (UVic)
Application Knowledge Required: Performance Modeling for Fun and Profit
(Speech / Talk)
2023-06-07, Event: HPC mini-symposium, Institute for Computational Science and Engineering (DCSE), Delft University of Technology (TU Delft)
Thirteen modern ways to fool the masses with performance results on parallel computers
(Speech / Talk)
2023-06-06, Event: Dinner Talk - DCSE Summerschool: Numerical Linear Algebra on High Performance Computers, Delft University of Technology (TU Delft)
Application Knowledge Required: Performance Modeling for Fun and Profit
(Speech / Talk)
2023-05-25, Event: Seminar, Helmut-Schmidt-Universität - Universität der Bundeswehr Hamburg
NHR Alliance University High Performance Computing in Germany
(Speech / Talk)
2023-03-22, Event: Seminar, Delft University of Technology (TU Delft)
The National High-Performance Computing Alliance: New infrastructure and opportunities for science and research at German universities
(Speech / Talk)
2023-03-15, Event: 35th Molecular Modelling Workshop 2023

2022

The National High Performance Computing Alliance and NHR@FAU: New structures and opportunities
(Speech / Talk)
2022-12-19, Event: Physik Kolloquium, Universität Regensburg
Power, Energy and HPC
(Speech / Talk)
2022-11-24, Event: Sustainability and Computational Science 2022, Lund University
URL: https://www.compute.lu.se/other-activities/sustainability-and-computational-science-2022/
The National High Performance Computing Alliance and NHR@FAU: New structures and opportunities
(Speech / Talk)
2022-11-23, Event: Kolloquium des Forschungszentrums für wissenschaftliches Rechnen, Universität Bayreuth
Level-based Blocking for Sparse Matrix-Power-Vector Multiplication
(Speech / Talk)
2022-08-19, Event: CS Seminar, Lawrence Berkeley National Laboratory (LBNL)

2019

Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

(Third Party Funds Group – Sub project)

Overall project: Energy Oriented Center of Excellence: toward exascale for energy
Project leader: Gerhard Wellein
Term: 2019-01-01 - 2021-12-31
Acronym: EoCoE-II
Funding source: Europäische Union (EU)

→More information

2017

Metaprogrammierung für Beschleunigerarchitekturen

(Third Party Funds Single)

Project leader: Gerhard Wellein, Harald Köstler
Term: 2017-01-01 - 2019-12-31
Acronym: MeTacca
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)

Abstract

In Metacca wird das AnyDSL Framework zu einer homogenen Programmierumgebung für
heterogene Ein- und Mehrknoten-Systeme ausgebaut. Hierbei wird die UdS den Compiler und das Typsystem von AnyDSL erweitern, um dem Programmierer das produktive Programmieren von Beschleunigern zu ermöglichen. Darauf aufbauend wird der LSS geeignete Abstraktionen für die Verteilung und Synchronisation auf Ein- und Mehrknoten-Rechnern in Form einer DSL in AnyDSL entwickeln. Alle Komponenten werden durch Performance Modelle (RRZE) unterstützt
Eine Laufzeitumgebung mit eingebautem Performance-Profiling kümmert sich um Resourcenverwaltung und Systemkonfiguration. Das entstandene Framework wird anhand zweier Anwendungen, Ray-Tracing (DFKI) und Bioinformatik (JGU), evaluiert.
Als Zielplattformen dienen Einzelknoten und Cluster mit mehreren Beschleunigern (CPUs, GPUs, Xeon Phi).

Die Universität Erlangen-Nürnberg ist hauptverantwortlich für die Unterstützung von verteilter
Programmierung (LSS) sowie für die Entwicklung und Umsetzung von unterstützenden Performance-Modellen sowie einer integrierten Profiling Komponente (RRZE). In beiden Teilbereichen wird zu Beginn eine Anforderungsanalyse durchgeführt um weitere Schritte zu planen und mit den Partnern abzustimmen.
Der LSS wird im ersten Jahr die Verteilung der Datenstrukturen umsetzen. Im weiteren Verlauf wird sich die Arbeit auf die Umsetzung von Synchronisationsmechanismen konzentrieren. Im letzten Jahr werden Codetransformationen entworfen, um die Konzepte für Verteilung und Synchronisation in AnyDSL auf die gewählten Anwendungen anzupassen. Das RRZE wird in einem ersten Schritt das kerncraft Framework in die partielle Auswertung integrieren. Hierbei wird kerncraft erweitert um aktuelle Beschleunigerarchitekturen sowie Modelle für die Distributed-Memory-Parallelisierung zu unterstützen. In zwei weiteren Paketen wird eine Ressourcenverwaltung und eine auf LIKWID basierende Profiling Komponente umgesetzt

→More information
Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

(Third Party Funds Single)

Project leader: Gerhard Wellein
Term: 2017-01-01 - 2019-12-31
Acronym: ProPE
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)
URL: https://blogs.fau.de/prope/

Abstract

The ProPE project will deploy a prototype HPC user support
infrastructure as a distributed cross-site collaborative effort of several
tier-2/3 centers with complementing HPC expertise. Within ProPE
code optimizing and parallelization of scientific software is seen as a
structured, well-defined process with sustainable outcome. The
central component of ProPE is the improvement, process-based
implementation, and dissemination of a structured performance
engineering (PE) process. This PE process defines and drives code
optimization and parallelization as a target-oriented, structured
process. Application hot spots are identified first and then
optimized/parallelized in an iterative cycle: Starting with an analysis of
the algorithm, the code, and the target hardware a hypothesis of the
performance-limiting factors is proposed based on performance
patterns and models. Performance measurements validate or guide
the iterative adaption of the hypothesis. After validation of the
hardware bottleneck, appropriate code changes are deployed and the
PE cycle restarts. The level of detail of the PE process can be
adapted to the complexity of the underlying problem and the
experience of the HPC analyst. Currently this process is applied by
experts and at the prototype level. ProPE will formalize and document
the PE process and apply it to various scenarios (single core/node
optimization, distributed parallelization, IO-intensive problems).
Different abstraction levels of the PE process will be implemented and
disseminated to HPC analysts and application developers via user
support projects, teaching activities, and web documentation. The
integration of the PE process into modern IT infrastructure across
several centers with different HPC support expertise will be the
second project focus. All components of the PE process will be
coordinated and standardized across the partnering sites. This way
the complete HPC expertise within ProPE can be offered as coherent
service on a nationwide scale. Ongoing support projects can be
transferred easily between participating centers. In order to identify
low-performing applications, characterize application loads, and
quantify benefits of the PE activities at a system level, ProPE will
employ a system monitoring infrastructure for HPC clusters. This tool
will be tailored to the requirements of the PE process and designed
for easy deployment and usage at tier-2/3 centers. The associated
ProPE partners will ensure the embedding into the German HPC
infrastructure and provide basic PE expertise in terms of algorithmic
choices, perfectly complementing the code optimization and
parallelization efforts of ProPE.

→More information
Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

(Third Party Funds Single)

Project leader: Gerhard Wellein
Term: 2017-03-01 - 2020-02-29
Acronym: SeASiTe
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)

Abstract

Das Forschungsprojekt SeASiTe stellt sich der Aufgabe, eine systematische Untersuchung von Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen durchzuführen. Das Ziel ist der Entwurf und die Bereitstellung des Prototypen eines Werkzeugkastens, mit dessen Hilfe Programmierer ihre Anwendungen mit effizienten Selbstadaptionstechniken ausstatten können. Der Ansatz beinhaltet die Selbstadaption sowohl hinsichtlich relevanter System- und Programmparameter als auch möglicher Programmtransformationen.
Die Optimierung der Programmausführung für mehrere nicht-funktionale Ziele (z.B. Laufzeit oder Energieverbrauch) soll auf einer Performance-Modellierung zur Eingrenzung des Suchraums effizienter Programmvarianten aufbauen. Anwendungsunabhängige Methoden und Strategien zur Selbstadaption sollen in einem Autotuning-Navigator gekapselt werden.

Das Erlanger Teilprojekt beschäftigt sich zunächst mit der modellbasierten Verständnis von Autotuning-Verfahren für reguläre Simulationsalgorithmen am Beispiel verschiedener gängiger Stencilklassen. Dabeisollen mit Hilfe erweiterter Performancemodelle strukturierte Richtlinien und Empfehlungen für den Autotuning-Prozess bzgl. relevanter Code-Transformationen und der Beschränkung des Suchraums für Optimierungsparameter erstellt und für den Autotuning-Navigator exemplarisch aufbereitet werden.
Der zweite Schwerpunkt der Arbeiten besteht in der Erweiterung bestehender analytischer
Performancemodelle und Software-Werkzeuge auf neue Rechnerarchitekturen und der Integration in den Autotuning-Navigator. Darüber hinaus betreut der Erlanger Gruppe den Demonstrator für Stencil-Codes.
Die Gruppe wirkt weiters an der Auslegung des AT-Navigators und der Definition von Schnittstellen mit.

→More information

2016

SPP EXA 1648

(Third Party Funds Group – Sub project)

Overall project: SPP EXA 1648
Project leader: Gerhard Wellein
Term: 2016-01-01 - 2019-12-31
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

→More information
EXASTEEL II - Bridging Scales for Multiphase Steels

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Project leader: Gerhard Wellein
Term: 2016-01-01 - 2018-12-31
Acronym: SPPEXA
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: http://www.numerik.uni-koeln.de/14079.html

Abstract

In the EXASTEEL-2 project, experts on scalable iterative solvers, computational modeling in materials science, performance engineering, and parallel direct solvers are joining forces to develop new computational algorithms and implement software for a grand challenge problem from computational materials science.

There is an increasing need for predictive simulations of the macroscopic behavior of complex new materials. In the EXASTEEL-2 project, this problem is considered for modern micro-heterogeneous (dual-phase) steels, attempting to predict the macroscopic properties of new materials from those on the microscopic level. It is the goal to develop algorithms and software towards a virtual laboratory for predictive material testing in silico. A bottleneck is the computational complexity of the multiscale models needed to describe the new materials, involving sufficiently accurate micromechanically motivated models on the crystalline scale. Therefore, new ultra-scalable nonlinear implicit solvers will be developed and combined with a highly parallel computational scale bridging approach (FE^2), intertwined with a consequent and permanent performance engineering, to bring the challenging engineering application of a virtual laboratory for material testing and design to extreme scale computing. We envisage a continuously increased transition from descriptive to predictive macroscopic simulations and take into account, to the best of our knowledge for the first time within a computational scale bridging approach, the polycrystalline nature of dual phase steels including grain boundary effects at the microscale.

Our goals could not be reached without building on the algorithm and software infrastructure from EXASTEEL-1. We will complete the paradigm shift, begun in the EXASTEEL-1 project, from Newton-Krylov solvers to nonlinear methods (and their composition) with improved concurrency and reduced communication. By combining nonlinear domain decomposition with multigrid methods we plan to leverage the scalability of both implicit solver approaches for nonlinear methods.

Although our application is specific, the algorithms and optimized software will have an impact well beyond the particular application. Nonlinear implicit solvers are at the heart of many simulation codes, and our software building blocks PETSc, BoomerAMG, PARDISO, and FEAP are all software packages with a large user base. The advancement of these software packages is explicitly planned for in the work packages of this project.

The project thus adresses computational algorithms (nonlinear implicit solvers and scale bridging), application software, and programming (PE, hybrid programming, accelerators).

→More information
Equipping Sparse Solvers for Exascale II (ESSEX-II)

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Project leader: Gerhard Wellein
Term: 2016-01-01 - 2018-12-31
Acronym: SPPEXA
Funding source: DFG / Schwerpunktprogramm (SPP)
URL: https://blogs.fau.de/essex/activities

Abstract

The ESSEX-II project will use the successful concepts and software
blueprints developed in ESSEX-I for sparse eigenvalue solvers to
produce widely usable and scalable software solutions with high
hardware efficiency for the computer architectures of the upcoming
decade. All activities are organized along the traditional software
layers of low-level parallel building blocks (kernels), algorithm
implementations, and applications. However, the classic abstraction
boundaries separating these layers are broken in ESSEX-II by
strongly integrating objectives: scalability, numerical reliability, fault
tolerance, and holistic performance and power engineering. Driven by
Moores Law and power dissipation constraints, computer systems will
become more parallel and heterogeneous even on the node level in
upcoming years, further increasing overall system parallelism. MPI+X
programming models can be adapted in flexible ways to the
underlying hardware structure and are widely expected to be able to
address the challenges of the massively multi-level parallel
heterogeneous architectures of the next decade. Consequently, the
parallel building blocks layer supports MPI+X, with X being a
combination of node-level programming models able to fully exploit
hardware heterogeneity, functional parallelism, and data parallelism.
In addition, facilities for fully asynchronous checkpointing, silent data
corruption detection and correction, performance assessment,
performance model validation, and energy measurements will be
provided. The algorithms layer will leverage the components in the
building blocks layer to deliver fully heterogeneous, automatically
fault-tolerant, and state-of-the-art implementations of Jacobi-Davidson
eigensolvers, the Kernel Polynomial Method (KPM), and Chebyshev
Time Propagation (ChebTP) that are ready to use for production on
modern heterogeneous compute nodes with best performance and
numerical accuracy. Chebyshev filter diagonalization (ChebFD) and a
Krylov eigensolver complement these implementations, and the
recent FEAST method will be investigated and further developed for
improved scalability. The applications layer will deliver scalable
solutions for conservative (Hermitian) and dissipative (non-Hermitian)
quantum systems with strong links to optics and biology and to novel
materials such as graphene and topological insulators. Extending its
predecessor project, ESSEX-II adopts an additional focus on
production-grade software. Although the selection of algorithms is
strictly motivated by quantum physics application scenarios, the
underlying research directions of algorithmic and hardware efficiency,
accuracy, and resilience will radiate into many fields of computational
science. Most importantly, all developments will be accompanied by
an uncompromising performance engineering process that will
rigorously expose any discrepancy between expected and observed
resource efficiency.

→More information
Bridging scales - from Quantum Mechanics to Continuum Mechanics. A Finite Element approach.

(Third Party Funds Single)

Project leader: Denis Davydov
Term: 2016-01-01 - 2018-09-30
Funding source: DFG-Einzelförderung / Sachbeihilfe (EIN-SBH)

Abstract

The concurrently coupled Quantum Mechanics (QM) - Continuum Mechanics (CM) approach for electro-elastic problems is considered in this proposal. Despite the fact that efforts have been made to bridge different description of matter, many questions are yet to be answered. First, an efficient Finite Element (FE)-based solution approach to the Kohn-Sham (KS) equations of Density Functional Theory (DFT) will be further developed. The h-adaptivity in the FE-based solution with non-local pseudo-potentials, as well as the mesh transformation during the structural optimization and formulation of the deformation map are the main topics to be studied. It should be noted that until now there exists no open-source implementation of the DFT approach which uses a FE basis and provides hp-refinement capabilities. A FE basis is very attractive in the context of the DFT theory because of its completeness, refinement possibility as well as good polarization properties based on domain decomposition. Second, QM quantities will be related to their CM counterparts (e.g. displacements, deformation gradient, the Piola stress, polarization, etc). This will be achieved using averaging in the Lagrangian configuration. To that end the full control over a FE-based solution of the KS equations is required. The procedure is then to be tested on a representative numerical example - bending of a single wall carbon nanotube. On the CM side, the surface-enhanced continuum theory will be utilized to properly capture surface effects. It should be noted that although several theoretical works exist on this matter, no numerical attempts have been made to check their validity on test examples. Lastly, based on the correspondence between different formulations, a concurrently coupled QM-CM method will be proposed. Coupling will be achieved in a staggered way, i.e. QM and CM problems will be solved iteratively with a proper exchange of information between them. A test-problem of crack propagation in a graphene sheet will be considered. As a long term goal of the project, coupling strategies for electro-elastic problems will be developed. To the best of my knowledge, non of the QM-CM coupling method is capable to handle electro-elastic problems.

→More information
Ultra-Skalierbare Multiphysiksimulationen für Erstarrungsprozesse in Metallen

(Third Party Funds Group – Overall project)

Project leader: Ulrich Rüde
Term: 2016-02-01 - 2019-01-31
Acronym: SKAMPY
Funding source: BMFTR / Verbundprojekt

Abstract

Komplexe Phänomene in den Natur- und Ingenieurwissenschaften werden dank der rapide steigenden Rechenleistung immer öfter mit Hilfe von realitätsgetreuen Simulationstechniken erforscht. Das daraus entstandene Fachgebiet Computational Science and Engineering (CSE) gilt deshalb als neue, dritte Säule der Wissenschaft, die die beiden klassischen Säulen Theorie und Experiment ergänzt und verstärkt. Im Kern des CSE geht es darum, leistungsfähige Simulationsmethoden für aktuelle und zukünftige Höchstleistungsrechner zu entwerfen, zu analysieren und sie für die praktische Nutzung robust, benutzerfreundlich und zuverlässig zu implementieren.

Für die Entwicklung neuer Materialien mit besseren Werkstoffeigenschaften, sowie für die Optimierung von Herstellungs- und Fertigungsprozessen sind moderne und hocheffiziente Simulationstechniken heute unverzichtbar. Sie ersetzen hier zu einem großen Teil die traditionellen zeit- und kostenintensiven Experimente, die sonst für die Materialentwicklung und die Qualitätssteigerung von Werkstoffkomponenten erforderlich sind. Materialsimulationen bilden dabei jedoch eine große Herausforderung für die Grundlagenforschung und für das Höchstleistungsrechnen.

Die mechanischen Eigenschaften eines Werkstoffes werden ganz wesentlich durch die Ausbildung der Mikrostruktur beim Herstellungsprozess - d.h. bei der Erstarrung aus der Schmelze - festgelegt. Die Simulation des Erstarrungsprozesses kann dabei wichtige neue Erkenntnisse über experimentell nicht beobachtbare Gefügeausbildungsprozesse liefern und dies ermöglicht es, den Einfluss auf die erzielte Struktur systematisch zu analysieren. Hiermit wird es in Zukunft möglich, neue Materialien mit speziellen Eigenschaften virtuell am Computer zu entwerfen.

Simulationsbasierte Forschungs- und Entwicklungsarbeiten für diese Problemstellung erfordern eine sehr feine räumliche und zeitliche Auflösung, um alle relevanten physikalischen Effekte abzubilden und deshalb benötigen sie eine extrem hohe Rechenleistung. Um auf künftigen Großrechensystemen derartige Probleme mit vielen Tausend Rechenknoten lösen zu können, muss die eingesetzte Simulationssoftware nicht nur in der Lage sein, diese vielen Rechenknoten gleichzeitig zu nutzen, sondern sie muss darüber hinaus auch eine maximale Rechenleistung bei möglichst geringem Ressourcenverbrauch liefern. Neben der eigentlichen Rechenzeit gewinnt hier auch der Energieverbrauch der Supercomputer eine erhebliche Bedeutung. Als Software Basis von SKAMPY wird das waLBerla Framework verwendet. In diesem Projekt wird waLBerla nun erweitert um neue anwendungsorientierte Probleme in den Materialwissenschaften zu lösen. Dabei kommen speziell entwickelte Programmiermethoden zum Einsatz, die eine besonders gute Ausnutzung der Supercomputer ermöglichen. Im Rahmen einer vielversprechenden gemeinsamen Machbarkeitsstudie für die Simulation von Erstarrungsprozessen in Metalllegierungen wurde bereits die Leistungsfähigkeit des Ansatzes und die Portierbarkeit auf die Architekturen aller drei deutschen Höchstleistungsrechner gezeigt, so dass das Projektkonsortium nun bestens aufgestellt ist, um Supercomputersimulationen auch für zukünftige, noch deutlich komplexere Forschungsaufgaben nachhaltig nutzbar zu machen.

→More information

2012

ESSEX - Equipping Sparse Solvers for Exascale

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Project leader: Gerhard Wellein, Georg Hager
Term: 2012-11-01 - 2019-06-30
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

→More information
ESSEX - Equipping Sparse Solvers for Exascale

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Project leader: Gerhard Wellein
Term: 2012-11-01 - 2015-12-31
Acronym: SPPEXA
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

The ESSEX project investigates the computational issues arising for large scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The exascale challenges of extreme parallelism, energy efficiency, and resilience will be addressed by coherent software design between the three project layers which comprise building blocks, algorithms and applications. The MPI+X programming model, a holistic performance engineering strategy, and advanced fault tolerance mechanisms are the driving forces behind all developments. Classic Krylov, Jacobi-Davidson and recent FEAST methods will be enabled for exascale computing and equipped with advanced, scalable preconditioners. New implementations of domainspecific iterative schemes in physics and chemistry, namely the established Chebyshev expansion techniques for the computation of spectral properties and their novel extension to the time evolution of driven quantum systems, complement these algorithms.The software solutions of the ESSEX project will be combined into an Exascale Sparse Solver Repository (“ESSR”), where the specific demands of the quantum physics users are recognized by integration of quantum state encoding techniques at the fundamental level. The relevance of this project can then be demonstrated through application of the ESSR algorithms to graphene-based structures, topological insulators, and quantum Hall effect devices. Such studies require exascale resources together with modern numerical methods to determine many eigenstates at a given point of the spectrum of extremely large matrices or to compute an approximation to their full spectrum. The concepts, methods and software building blocks developed in the ESSEX project serve as general blueprints for other scientific application areas that depend on sparse iterative algorithms. The strong vertical interaction between all three project layers ensures that the user can quickly utilize any progress on the lower layers and immediately use the power of exascale machines once they become available.

→More information
EXASTEEL - Bridging Scales for Multiphase Steels

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Project leader: Gerhard Wellein
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

This project adresses algorithms and Software for the Simulation of three dimensional multiscale material science problems on the future Supercomputers developed for exascale computing.The performance of modern high strength steels is governed by the complex interaction of the individual constituents on the microscale. Direct computational homogenization schemes such as the FE2 method allow for the high fidelity material design and analysis of modern steels. Using this approach, fluctuations of the local field equations (balance laws) can be resolved to a high accuracy, which is needed for the prediction of failure of such micro-heterogeneous materials.Performing the scale bridging within the FE2 method for realistic problems in 3D still requires new ultra-scalable, robust algorithms and solvers which have to be developed and incorporated into a new application Software.Such algorithms must be specifically designed to allow the efficient use of the future hardware.Here, the direct multiscale approach (FE2) will be combined with new, highly efficient, parallel solver algorithms. For the latter algorithms, a hybrid algorithmic approach will be taken, combining nonoverlapping parallel domain decomposition (FETl) methods with efficient parallel multigrid preconditioners. A comprehensive performance engineering approach will be implemented guided by the Pl Wellein, to ensure a systematic optimization and parallelization process across all Software layers.This project builds on parallel Simulation Software developed for the solution of complex nonlinear structural mechanics problem by the Pls Schröder, Balzani and Klawonn, Rheinbach. !t is based on the application Software package FEAP (Finite Element Analysis Program, R. Taylor, UC Berkeley). Within a new Software environment FEAP has been combined with a FETI-DP domain decomposition solver, based on PETSc (Argonne National Laboratory) and hypre (Lawrence Livermore National Laboratory), e.g„ to perform parallel simulations in nonlinear biomechanics. The optimization, performance modeling and performance engineering will be guided by the Pl Wellein. The Pls Schröder and Balzani have performed FE2-simulations in the past using an extended version of FEAP.The envisioned scale-bridging for realistic, advanced engineering problems in three dimensions will require a computational power which will only be obtainable when exascale computing becomes available.

→More information
TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework

(Third Party Funds Group – Sub project)

Overall project: SPP 1648: Software for Exascale Computing
Project leader: Gerhard Wellein
Term: 2012-11-01 - 2015-12-31
Funding source: DFG / Schwerpunktprogramm (SPP)

Abstract

Much of what one refers to as geological activity of the Earth is due to the fact that heat is transported from the interior of our planet to the surface in a planetwide solid-state convection in the Earth’s mantle. For this reason, the study of the dynamics of the mantle is critical to our understanding of how the entire planet works. Processes from earthquakes, plate tectonics, crustal evolution to the geodynamo are governed by convection in the mantle. Without a detailed knowledge of Earth‘s internal dynamic processes, we cannot hope to deduce the many interactions between shallow and deep Earth processes that dominate the Earth system. The vast forces associated with mantle convection cells drive horizontal movement of Earth’s surface in the form of plate tectonics, which is well known albeit poorly understood. They also induce substantial vertical motion in the form of dynamically maintained topography that manifests itself prominently in the geologic record through sea level variations and their profound impact on the ocean and climate system. Linking mantle processes to their surface manifestations is seen widely today as one of the most fundamental problems in the Earth sciences, while being at the same time a matter of direct practical relevance through the evolution of sedimentary basins and their paramount economical importance.Simulating Earth mantle dynamics requires a resolution in space and time that makes it one of the grand challenge applications in the computational sciences. With exascale systems of the future it will be possible to advance beyond the deterministic forward problem to a stochastic uncertainty analysis for the inverse problem. In fact, fluid dynamic inverse theory is now at hand that will allow us to track mantle motion back into the past exploiting the rich constraints available from the geologic record, subject to the availability of powerful geodynamical simulation software that could take advantage of these future supercomputers.The new community code TERRA-NEO will be based on a carefully designed multi-scale spacetime discretization using hybridized Discontinuous Galerkin elements on an icosahedral mesh with block-wise refinement. This advanced finite element technique promises better stability and higher accuracy for the non-linear transport processes in the Earth mantle while requiring less communication in a massively parallel setting. The resulting algebraic systems with finally more than 1012 unknowns per time step will be solved by a new class of communication-avoiding, asynchronous multigrid preconditioners that will achieve maximal scalability and resource-optimized computational performance. A non-deterministic control flow and a lazy evaluation strategy will alleviate the traditional over-synchronization of hierarchical iterative methods and will support advanced resiliency techniques on the algorithmic level.The software framework of TERRA-NEO will be developed specifically for the upcoming heterogeneous exascale computers by using an advanced architecture-aware design process. Special white-box performance models will guide the software development leading to a holistic co-design of the data structures and the algorithms on all levels. With this systematic performance engineering methodology we will also optimize a balanced compromise between minimal energy consumption and shortest run time.This consortium is fully committed to the interdisciplinary collaboration that is necessary for creating TERRA-NEO as new exascale simulation framework. To this end, TERRA-NEO brings top experts together that cover all aspects of CS&E, from modeling via the discretization to solvers and software engineering for exascale architectures.

→More information

2011

Eine fehlertolerante Umgebung für peta-scale MPI-Löser

(Third Party Funds Group – Sub project)

Overall project: FEToL
Project leader: Gerhard Wellein
Term: 2011-06-01 - 2014-05-31
Funding source: Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)

→More information

2009

SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen

(Third Party Funds Group – Overall project)

Project leader: Gerhard Wellein
Term: 2009-01-01 - 2011-12-31
Acronym: SKALB
Funding source: BMFTR / Verbundprojekt, Bundesministerium für Forschung, Technologie und Raumfahrt (BMFTR)

Abstract

Ziel des vom BMBF geförderten Projekts SKALB (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist die effiziente Implementierung und Weiterentwicklung von Lattice-Boltzmann basierten Strömungslösern zur Simulation komplexer Multi-Physik-Anwendungen auf Rechnern der Petascale-Klasse. Die Lattice-Boltzmann Methode ist ein akzeptiertes Lösungsverfahren im Bereich der numerischen Strömungsmechanik. Als zentraler Vorteil der Methode ist die prinzipielle Einfachheit des numerischen Verfahrens zu nennen, so dass sich sowohl komplexe Strömungsgeometrien wie poröse Medien oder Metallschäume als auch direkte numerische Simulationen (DNS) zur Untersuchung turbulenter Strömungen effizient berechnen lassen. Im Projekt SKALB sollen Lattice-Boltzmann-Applikationen für die neue Klassen massivst paralleler heterogener und homogener Supercomputer methodisch und technisch weiterentwickelt werden. Das RRZE bringt seine langjährige Erfahrung auf dem Gebiet der Performancemodellierung und effizienten Implementierung von Lattice-Boltzmann-Methoden auf einem breiten Spektrum moderner Rechner ein und beschäftigt sich darüberhinaus mit neuen Programmieransätzen für Multi-/Manycore Prozessoren. Der am RRZE weiterentwickelte Applikationscode soll gemeinsam mit der AG Prof. Schwieger zur massiv parallelen Simulation von Strömungen in porösen Medien eingesetzt werden.

→More information

2019

Afzal A., Hager G., Wellein G.:
Propagation and Decay of Injected One-Off Delays on Clusters: A Case Study
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8890995
Alvermann A., Basermann A., Bungartz HJ., Carbogno C., Ernst D., Fehske H., Futamura Y., Galgon M., Hager G., Huber S., Huckle T., Ida A., Imakura A., Kawai M., Köcher S., Kreutzer M., Kus P., Lang B., Lederer H., Manin V., Marek A., Nakajima K., Nemec L., Reuter K., Rippl M., Röhrig-Zöllner M., Sakurai T., Scheffler M., Scheurer C., Shahzad F., Simoes Brambila D., Thies J., Wellein G.:
Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects
In: Japan Journal of Industrial and Applied Mathematics (2019)
ISSN: 0916-7005
DOI: 10.1007/s13160-019-00360-8
Bauer M., Hötzer J., Ernst D., Hammer J., Seiz M., Hierl H., Hönig J., Köstler H., Wellein G., Nestler B., Rüde U.:
Code generation for massively parallel phase-field simulations
2019 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019 (Denver, CO, 2019-11-17 - 2019-11-22)
In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2019
DOI: 10.1145/3295500.3356186
Eitzinger J., Gruber T., Afzal A., Zeiser T., Wellein G.:
ClusterCockpit-A web application for job-specific performance monitoring
2019 IEEE International Conference on Cluster Computing, CLUSTER 2019 (Albuquerque, NM, 2019-09-23 - 2019-09-26)
In: Proceedings - IEEE International Conference on Cluster Computing, ICCC 2019
DOI: 10.1109/CLUSTER.2019.8891017
Hornich J., Hammer J., Hager G., Gruber T., Wellein G.:
Collecting and presenting reproducible intranode stencil performance: INSPECT
In: Supercomputing Frontiers and Innovations 6 (2019), p. 4-25
ISSN: 2409-6008
DOI: 10.14529/js?190301
Laukemann J., Hammer J., Hager G., Wellein G.:
Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels
10th IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2019
DOI: 10.1109/PMBS49563.2019.00006
Laukemann J., Hammer J., Hofmann J., Hager G., Wellein G.:
Automated instruction stream throughput prediction for intel and AMD microarchitectures
2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2018 (Dallas, TX, 2018-11-12)
In: Proceedings of PMBS 2018: Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis 2019
DOI: 10.1109/PMBS.2018.8641578

2018

Anzt H., Kreutzer M., Ponce E., Peterson GD., Wellein G., Dongarra J.:
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications 32 (2018), p. 220-230
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
Kreutzer M., Ernst D., Bishop AR., Fehske H., Hager G., Nakajima K., Wellein G.:
Chebyshev filter diagonalization on modern manycore processors and GPGPUs
Springer Verlag, 2018
ISBN: 9783319920399
DOI: 10.1007/978-3-319-92040-5_17
Laukemann J., Hammer J., Hofmann J., Hager G., Wellein G.:
Automated Instruction Stream Throughput Prediction for Intel and AMD Microarchitectures
2018 ACM/IEEE Supercomputing Conference (Dallas, TX, 2018-11-12 - 2018-11-12)
In: 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) 2018
DOI: 10.1007/978-3-319-92040-5_2
URL: https://ieeexplore.ieee.org/document/8641578
Shahzad F., Thies J., Kreutzer M., Zeiser T., Hager G., Wellein G.:
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
In: IEEE Transactions on Parallel and Distributed Systems (2018)
ISSN: 1045-9219
DOI: 10.1109/TPDS.2018.2866794
URL: https://ieeexplore.ieee.org/document/8444763
Wittmann M., Haag V., Zeiser T., Köstler H., Wellein G.:
Lattice Boltzmann benchmark kernels as a testbed for performance analysis
In: Computers & Fluids 172 (2018), p. 582-592
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2018.03.030
Wittmann M., Hager G., Janalik R., Lanser M., Klawonn A., Rheinbach O., Schenk O., Wellein G.:
Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model
30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (Lyon, 2018-09-24 - 2018-09-27)
In: 2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), NEW YORK: 2018
DOI: 10.1109/SBAC-PAD.2018.00047

2017

Anzt H., Gates M., Dongarra J., Kreutzer M., Wellein G., Köhler M.:
Preconditioned Krylov solvers on GPUs
In: Parallel Computing 68 (2017), p. 32-44
ISSN: 0167-8191
DOI: 10.1016/j.parco.2017.05.006
Galgon M., Krämer L., Lang B., Alvermann A., Fehske H., Pieper A., Hager G., Kreutzer M., Shahzad F., Wellein G., Basermann A., Röhrig-Zöllner M., Thies J.:
Improved coefficients for polynomial filtering in ESSEX
1st InternationalWorkshop on Eigenvalue Problems: Algorithms, Software and Applications in Petascale Computing, EPASA 2015
DOI: 10.1007/978-3-319-62426-6_5
Hammer J., Eitzinger J., Hager G., Wellein G.:
Kerncraft: A Tool for Analytic Performance Modeling of Loop Kernels
10th International Workshop on Parallel Tools for High Performance Computing (Stuttgart, Germany, 2016-10-04 - 2016-10-05)
In: Niethammer C, Gracia J, Hilbrich T, Knüpfer A, Resch MM, Nagel WE (ed.): Tools for High Performance Computing 2016, Cham: 2017
Hofmann J., Fey D., Riedmann M., Eitzinger J., Hager G., Wellein G.:
Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors
In: Concurrency and Computation-Practice & Experience 29 (2017)
ISSN: 1532-0626
DOI: 10.1002/cpe.3921
Hofmann J., Hager G., Wellein G., Fey D.:
An Analysis of Core- and Chip-Level Architectural Features in Four Generations of Intel Server Processors
32nd International Conference on High Performance Computing: ISC High Performance 2017 (Frankfurt)
In: High Performance Computing. ISC 2017. Lecture Notes in Computer Science, vol 10266, Cham: 2017
DOI: 10.1007/978-3-319-58667-0_16
Röhl T., Eitzinger J., Hager G., Wellein G.:
LIKWID monitoring stack: A flexible framework enabling job specific performance monitoring for the masses
2017 IEEE International Conference on Cluster Computing, CLUSTER 2017
DOI: 10.1109/CLUSTER.2017.115

2016

Anzt H., Dongarra J., Kreutzer M., Wellein G., Köhler M.:
Efficiency of general Krylov methods on GPUs - An experimental study
30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
DOI: 10.1109/IPDPSW.2016.45
Anzt H., Kreutzer M., Ponce E., Peterson GD., Wellein G., Dongarra J.:
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In: International Journal of High Performance Computing Applications (2016)
ISSN: 1094-3420
DOI: 10.1177/1094342016646844
URL: http://hpc.sagepub.com/content/early/2016/05/05/1094342016646844.abstract
Bauer S., Bunge HP., Drzisga DP., Gmeiner B., Huber M., John L., Mohr M., Rüde U., Stengel H., Waluga C., Weißmüller J., Wellein G., Wittmann M., Wohlmuth BI.:
Hybrid Parallel Multigrid Methods for Geodynamical Simulations
In: Bungartz H., Neumann P., Nagel E. (ed.): 113, Berlin, Heidelberg, New York: Springer, 2016, p. 211-235 (Lecture Notes in Computational Science and Engineering, Vol.113)
ISBN: 978-3-319-40526-1
DOI: 10.1007/978-3-319-40528-5_10
URL: http://link.springer.com/chapter/10.1007%2F978-3-319-40528-5_10
Hager G., Eitzinger J., Habich J., Wellein G.:
Exploring performance and power properties of modern multi-core chips via simple machine models
In: Concurrency and Computation-Practice & Experience 28 (2016), p. 189-210
ISSN: 1532-0626
DOI: 10.1002/cpe.3180
Hofmann J., Fey D., Eitzinger J., Hager G., Wellein G.:
Analysis of intel’s haswell microarchitecture using the ECM model and microbenchmarks
Springer Verlag, 2016
ISBN: 9783319306940
DOI: 10.1007/978-3-319-30695-7_16
Hofmann J., Fey D., Eitzinger J., Hager G., Wellein G.:
Analysis of Intel's Haswell Microarchitecture Using the ECM Model and Microbenchmarks
29th International Conference on Architecture of Computing Systems (Nuremberg)
In: Architecture of Computing Systems -- ARCS 2016: 29th International Conference, Nuremberg, Germany, April 4-7, 2016, Proceedings, Cham: 2016
DOI: 10.1007/978-3-319-30695-7_16
Hofmann J., Fey D., Riedman M., Eitzinger J., Hager G., Wellein G.:
Performance analysis of the Kahan-enhanced scalar product on current multi-corecore and many-core processors
In: Concurrency and Computation-Practice & Experience 28 (2016)
ISSN: 1532-0626
DOI: 10.1002/cpe.3921
Kreutzer M., Thies J., Röhrig-Zöllner M., Pieper A., Shahzad F., Galgon M., Basermann A., Fehske H., Hager G., Wellein G.:
GHOST: Building Blocks for High Performance Sparse Linear Algebra on Heterogeneous Systems
In: International Journal of Parallel Programming (2016), p. 1-27
ISSN: 0885-7458
DOI: 10.1007/s10766-016-0464-z
Thies J., Galgon M., Shahzad F., Alvermann A., Kreutzer M., Pieper A., Röhrig-Zöllner M., Basermann A., Fehske H., Hager G., Lang B., Wellein G.:
Towards an exascale enabled sparse solver repository
Springer Verlag, 2016
ISBN: 9783319405261
DOI: 10.1007/978-3-319-40528-5_13
Wellein G., Alvermann A., Fehske H., Hager G., Kreutzer M., Lang B., Pieper A., Galgon M.:
High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations
In: Journal of Computational Physics 325 (2016), p. 226-243
ISSN: 0021-9991
DOI: 10.1016/j.jcp.2016.08.027

2015

Hammer J., Hager G., Eitzinger J., Wellein G.:
Automatic loop kernel analysis and performance modeling with kerncraft
6th International Workshop in Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems, PMBS 2015 - Held as part of the 27th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015
DOI: 10.1145/2832087.2832092
Hammer J., Hager G., Eitzinger J., Wellein G.:
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
SC15 The International Conference for High Performance Computing, Networking, Storage and Analysis (Austin, TX, USA, 2015-11-15)
In: Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems, New York, NY, USA: 2015
DOI: 10.1145/2832087.2832092
URL: http://dl.acm.org/citation.cfm?id=2832087&preflayout=flat
Hofmann J., Fey D., Eitzinger J., Hager G., Wellein G.:
Performance analysis of the Kahan-enhanced scalar product on current multicore processors
the 11th International Conference on Parallel Processing and Applied Mathematics (Krakow, Poland)
In: Accepted for PPAM 2015 2015
URL: http://arxiv.org/abs/1505.02586
Klawonn A., Lanser M., Rheinbach O., Stengel H., Wellein G.:
Hybrid MPI/OpenMP Parallelization in FETI-DP Methods
In: Recent Trends in Computational Engineering - CE2014, -: Springer Link, 2015, p. 67-84 (Lecture Notes in Computational Science and Engineering, Vol.105)
ISBN: 978-3-319-22997-3
DOI: 10.1007/978-3-319-22997-3_4
URL: http://link.springer.com/chapter/10.1007%2F978-3-319-22997-3_4
Kreutzer M., Hager G., Wellein G., Alvermann A., Fehske H., Pieper A.:
Performance Engineering of the Kernel Polynomal Method on Large-Scale CPU-GPU Systems
Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International (Hyderabad, India, 2015-05-25 - 2015-05-29)
In: IEEE (ed.): Proceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS) 2015
DOI: 10.1109/IPDPS.2015.76
URL: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7161530
Malas T., Hager G., Ltaief H., Stengel H., Wellein G., Keyes D.:
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
In: SIAM Journal on Scientific Computing 37 (2015), p. C439-C464
ISSN: 1064-8275
DOI: 10.1137/140991133
Röhrig-Zöllner M., Thies J., Kreutzer M., Alvermann A., Pieper A., Basermann A., Hager G., Wellein G., Fehske H.:
Increasing the performance of the Jacobi-Davidson method by blocking
In: SIAM Journal on Scientific Computing DLR Portal ISSN 1064-8275 (2015), p. 1-27
ISSN: 1064-8275
DOI: 10.1137/140976017
URL: http://elib.dlr.de/98373/
Shahzad F., Kreutzer M., Zeiser T., Machado R., Pieper A., Hager G., Wellein G.:
Building a Fault Tolerant Application Using the GASPI Communication Layer
the 1st International Workshop on Fault-Tolerant Systems (Chicago, IL, 2015-09-08 - 2015-09-11)
In: Proceedings of FTS 2015, in conjunction with IEEE Cluster 2015: 2015
DOI: 10.1109/CLUSTER.2015.106
Wellein G., Eitzinger J., Hager G., Röhl T.:
Overhead Analysis of Performance Counter Measurements
43rd International Conference on Parallel Processing Workshops, ICPPW 2014
DOI: 10.1109/ICPPW.2014.34
Wittmann M., Hager G., Zeiser T., Treibig J., Wellein G.:
Chip-level and multi-node analysis of energy-optimized lattice Boltzmann CFD simulations
In: Concurrency and Computation-Practice & Experience (2015), p. 1-5
ISSN: 1532-0626
DOI: 10.1002/cpe.3489
URL: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3489/full

2014

Eitzinger J., Hager G., Wellein G., Stengel H.:
Quantifying performance bottlenecks of stencil computations using the Execution-Cache-Memory model
DOI: 10.1145/2751205.2751240
URL: http://arxiv.org/abs/1410.5010
Alvermann Andreas, Basermann Achim, Fehske Holger, Galgon Martin, Hager Georg, Kreutzer Moritz, Krämer Lukas, Lang Bruno, Pieper Andreas, Röhrig-Zöllner Melven, Shahzad Faisal, Jonas Thies, Wellein Gerhard:
ESSEX: Equipping Sparse Solvers for Exascale
In: Euro-Par 2014: Parallel Processing Workshops, Lecture Notes in Computer Science: SpringerLink, 2014, p. 577-588 (Lecture Notes in Computer Science, Vol.8806)
ISBN: 9783319143125
URL: http://link.springer.com/chapter/10.1007/978-3-319-14313-2_49
Hofmann J., Eitzinger J., Hager G., Wellein G.:
Comparing the Performance of Different x86 SIMD Instruction Sets for a Medical Imaging Application on Modern Multi- and Manycore Chips
2014 1st ACM SIGPLAN Workshop on Programming Models for SIMD/Vector Processing, WPMVP 2014 - Co-located with PPoPP 2014 (Orlando, USA, 2014-02-16 - 2014-02-16)
In: Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, New York, NY, USA: 2014
DOI: 10.1145/2568058.2568068
URL: http://dl.acm.org/citation.cfm?doid=2568058.2568068
Hofmann J., Eitzinger J., Hager G., Wellein G.:
Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator
In: ARCS Workshops'14 2014
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6775080&isnumber=6775071
Kreutzer M., Hager G., Wellein G., Fehske H., Bishop AR.:
A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units
In: SIAM Journal on Scientific Computing 36 (2014), p. C401C423
ISSN: 1064-8275
DOI: 10.1137/130930352
URL: http://epubs.siam.org/doi/abs/10.1137/130930352
Wittmann M., Zeiser T., Hager G., Wellein G.:
Modeling and analyzing performance for highly optimized propagation steps of the lattice Boltzmann method on sparse lattices
(2014)
Open Access: http://arxiv.org/abs/1410.0412
URL: https://arxiv.org/abs/1410.0412
(Techreport)

2013

Eitzinger J., Hager G., Hofmann H., Hornegger J., Wellein G.:
Pushing the limits for medical image reconstruction on recent standard multicore processors
In: International Journal of High Performance Computing Applications 27 (2013), p. 162-177
ISSN: 1094-3420
DOI: 10.1177/1094342012442424
Pieper A., Schubert G., Wellein G., Fehske H.:
Effects of disorder and contacts on transport through graphene nanoribbons
In: Physical Review B 88 (2013), p. 195409
ISSN: 1098-0121
DOI: 10.1103/PhysRevB.88.195409
Shahzad F., Wittmann M., Kreutzer M., Zeiser T., Hager G., Wellein G.:
A survey of checkpoint/restart techniques on distributed memory systems
In: Parallel Processing Letters 23 (2013), p. 1340011-1340030
ISSN: 0129-6264
DOI: 10.1142/S0129626413400112
URL: http://www.worldscientific.com/doi/abs/10.1142/S0129626413400112
Shahzad F., Wittmann M., Kreutzer M., Zeiser T., Hager G., Wellein G.:
PGAS implementation of SpMVM and LBM with GPI
The 7th International Conference on PGAS Programming Models (Edinburgh, Scotland, UK)
In: Proceedings of the 7th International Conference on PGAS Programming Models, Edinburgh: 2013
Shahzad F., Wittmann M., Zeiser T., Hager G., Wellein G.:
An Evaluation of Different I/O Techniques for Checkpoint/Restart
2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (Boston, MA, USA, 2013-05-20 - 2013-05-24)
In: Parallel and Distributed Processing Symposium Workshops PhD Forum (IPDPSW), 2013 IEEE 27th International, n.a.: 2013
DOI: 10.1109/IPDPSW.2013.145
Wittmann M., Hager G., Wellein G., Zeiser T., Krammer B.:
MPC and CoArray Fortran: Alternatives to classic MPI implementations on the examples of scalable lattice boltzmann flow solvers
15th Results and Review Workshop on High Performance Computing in Science and Engineering, HLRS 2012 (Stuttgart)
DOI: 10.1007/978-3-642-33374-3_27
Wittmann M., Zeiser T., Hager G., Wellein G.:
Comparison of Different Propagation Steps for Lattice Boltzmann Methods
In: Computers & Mathematics with Applications 65 (2013), p. 924-935
ISSN: 0898-1221
DOI: 10.1016/j.camwa.2012.05.002
URL: http://www.sciencedirect.com/science/article/pii/S0898122112003835

2012

Eitzinger J., Hager G., Wellein G.:
Performance Patterns and Hardware Metrics on Modern Multicore Processors: Best Practices for Performance Engineering
5th Workshop on Productivity and Performance (PROPER 2012) (Rhodes Island, Greece)
In: Euro-Par 2012, -: 2012
URL: http://arxiv.org/abs/1206.3738
Habich J., Feichtinger C., Köstler H., Hager G., Wellein G.:
Performance Engineering for the Lattice Boltzmann Method on GPGPUs: Architectural Requirements and Performance Results
In: Computers & Fluids (2012), p. 10
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2012.02.013.
URL: http://www.sciencedirect.com/science/article/pii/S0045793012000679
Hager G., Eitzinger J., Wellein G., Habich J.:
Exploring performance and power properties of modern multicore chips via simple machine models
In: Concurrency and Computation-Practice & Experience Submitted (2012), p. 22
ISSN: 1532-0626
URL: http://arxiv.org/abs/1208.2908
Sembritzki K., Hager G., Krammer B., Eitzinger J., Wellein G.:
Evaluation of the Coarray Fortran Programming Model on the Example of a Lattice Boltzmann Code
The 6th Conference on Partitioned Global Address Space Programming Models (Santa Barbara, CA, USA)
In: PGAS12, In Press: 2012
Shahzad F., Wittmann M., Zeiser T., Wellein G.:
Asynchronous Checkpointing by Dedicated Checkpoint Threads
In: Recent Advances in the Message Passing Interface, -: Springer-verlag, 2012, p. 289-290 (Lecture Notes in Computer Science, Vol.7490)
ISBN: 978-3-642-33517-4
DOI: 10.1007/978-3-642-33518-1_36
URL: http://link.springer.com/chapter/10.1007/978-3-642-33518-1_36
Wittmann M., Zeiser T., Hager G., Wellein G.:
Domain Decomposition and Locality Optimization for Large-Scale Lattice Boltzmann Simulations
In: Computers & Fluids (2012)
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2012.02.007
URL: http://www.sciencedirect.com/science/article/pii/S0045793012000527

2011

Bader M., Mehl M., Rüde U., Wellein G.:
Simulation software for supercomputers
In: Journal of Computational Science 2 (2011), p. 93-94
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2011.05.003
URL: http://www.sciencedirect.com/science/article/pii/S1877750311000342
Eitzinger J., Wellein G., Hager G.:
Efficient multicore-aware parallelization strategies for iterative stencil computations
In: Journal of Computational Science 2 (2011), p. 130137
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2011.01.010
URL: http://www.sciencedirect.com/science/article/pii/S1877750311000172
Feichtinger C., Köstler H., Hager G., Rüde U., Wellein G., Habich J.:
A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters
In: Parallel Computing 37 (2011), p. 536-549
ISSN: 0167-8191
DOI: 10.1016/j.parco.2011.03.005
URL: http://www.sciencedirect.com/science/article/pii/S0167819111000342
Habich J., Zeiser T., Hager G., Wellein G.:
Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA
PARENG 2009
In: Advances in Engineering Software, ScienceDirect: 2011
DOI: 10.1016/j.advengsoft.2010.10.007
URL: http://www.sciencedirect.com/science/article/pii/S0965997810001274
Hager G., Wellein G., Schubert G., Fehske H.:
Hybrid-parallel sparse matrix-vector multiplication with explicit communication overlap on current multicore-based systems.
In: Parallel Processing Letters 21 (2011), p. 339-358
ISSN: 0129-6264
DOI: 10.1142/S0129626411000254
Schubert G., Hager G., Fehske H., Wellein G.:
Parallel sparse matrix-vector multiplication as a test case for hybrid MPI OpenMP programming
25th IEEE International Parallel and Distributed Processing Symposium, Workshops and Phd Forum, IPDPSW 2011 (Anchorage, AK)
DOI: 10.1109/IPDPS.2011.332

2010

Eitzinger J., Hager G., Wellein G.:
Complexities of Performance Prediction for Bandwidth-Limited Loop Kernels on Multi-Core Architectures
Transactions of the Fourth Joint HLRB and KONWIHR Review and Results Workshop (Leibniz Supercomputing Centre, Garching/Munich, Germany)
In: High Performance Computing in Science and Engineering, Garching/Munich 2009, Berlin Heidelberg: 2010
DOI: 10.1007/978-3-642-13872-0_1
URL: http://www.springerlink.com/content/m1288m0174021600/
Eitzinger J., Hager G., Wellein G.:
LIKWID: A Lightweight Performance-Oriented Tool Suite for x86 Multicore Environments
39th International Conference on Parallel Processing Workshops (San Diego, CA, USA, 2010-09-13 - 2010-09-16)
In: Proceedings of PSTI2010, the First International Workshop on Parallel Software Tools and Tool Infrastructures, IEEE: 2010
DOI: 10.1109/ICPPW.2010.38
URL: http://arxiv.org/abs/1004.4431
Eitzinger J., Hager G., Wellein G., Meier M.:
LIKWID performance tools
URL: http://inside.hlrs.de/pdfs/inSiDE_spring2010.pdf
Feichtinger C., Habich J., Köstler H., Hager G., Rüde U., Wellein G.:
A Flexible Patch-Based Lattice Boltzmann Parallelization Approach for Heterogeneous GPU-CPU Clusters
(2010), p. 18
URL: https://www10.cs.fau.de/publications/reports/TechRep_2010-07.pdf
(Techreport)
Wittmann M., Hager G., Eitzinger J., Wellein G.:
Leveraging shared caches for parallel temporal blocking of stencil codes on multicore processors and clusters
In: Parallel Processing Letters 20 (2010), p. 359-376
ISSN: 0129-6264
DOI: 10.1142/S0129626410000296
URL: http://arxiv.org/abs/1006.3148

2009

Habich J., Zeiser T., Hager G., Wellein G.:
Speeding up a Lattice Boltzmann Kernel on nVIDIA GPUs
PARENG2009 (Pécs, Hungary, 2009-04-06 - 2009-04-08)
In: Proceedings of the First International Conference on Parallel, Distributed and Grid Computing for Engineering, Kippen, Stirlingshire, United Kingdom: 2009
Hager G., Stengel H., Zeiser T., Wellein G.:
RZBENCH: performance evaluation of current HPC architectures using low-level and application benchmarks
In: High Performance Computing in Science and Engineering, Garching/Munich 2007: Transactions of the Third Joint HLRB and KONWIHR Status and Result Workshop, Dec. 3-4, 2007, Leibniz Supercomputing Centre, Garching/Munich, Germany, Berlin, Heidelberg: Springer, 2009, p. 485-501 (Mathematics and Statistics, Vol.V)
ISBN: 978-3-540-69181-5
DOI: 10.1007/978-3-540-69182-2_39
Stürmer M., Wellein G., Hager G., Köstler H., Rüde U.:
Challenges and Potentials of Emerging Multicore Architectures
Third Joint HLRB and KONWIHR Status and Result Workshop (Garching, 2007-12-03 - 2007-12-04)
In: High Performance Computing in Science and Engineering Garching-Munich 2007, Berlin Heidelberg: 2009
URL: http://www.springer.com/math/cse/book/978-3-540-69181-5
Wellein G., Hager G., Zeiser T., Wittmann M., Fehske H.:
Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization
COMPSAC 2009 (Seattle, USA, 2009-07-20 - 2009-07-24)
In: Proceedings of 2009 33rd Annual IEEE International Computer Software and Applications Conference, IEEE Computer Society: 2009
DOI: 10.1109/COMPSAC.2009.82
Zeiser T., Hager G., Wellein G.:
Benchmark analysis and application results for lattice Boltzmann simulations on NEC SX vector and Intel Nehalem systems
In: Parallel Processing Letters 19 (2009), p. 491-511
ISSN: 0129-6264
DOI: 10.1142/S0129626409000389
URL: http://www.worldscinet.com/ppl/19/1904/S0129626409000389.html
Zeiser T., Hager G., Wellein G.:
The world's fastest CPU and SMP node: Some performance results from the NEC SX-9
23rd IEEE International Parallel and Distributed Processing Symposium (IPDPS) (Roma, 2009-05-23 - 2009-05-29)
In: Proceedings of the IEEE International Symposium on Parallel&Distributed Processing 2009, IEEE Computer Society: 2009
DOI: 10.1109/IPDPS.2009.5161089
Zeiser T., Hager G., Wellein G.:
Vector Computers in a World of Commodity Clusters, Massively Parallel Systems and Many-Core Many-Threaded CPUs: Recent Experience Based on an Advanced Lattice Boltzmann Flow Solver
In: High Performance Computing in Science and Engineering '08: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2008, Berlin Heidelberg: Springer, 2009, p. 333-347 (Mathematics and Statistics, Vol.5)
ISBN: 978-3-540-88301-2
DOI: 10.1007/978-3-540-88303-6_24
Zeiser T., Hager G., Wellein G., Inayat A., Schwieger W., Heidig T., Freund H.:
Selecting an Appropriate Computational Platform for Supporting the Development of New Catalyst Carriers
In: Innovatives Supercomputing in Deutschland : inSiDE 7 Spring (2009), p. 12-16
URL: http://inside.hlrs.de/htm/Edition_01_09/article_05.html

2008

Breuer M., Zeiser T., Hager G., Wellein G., Lammers P.:
Direct numerical simulation of turbulent flow over dimples - Code optimization for NEC SX-8 plus flow results
In: High Performance Computing in Science and Engineering '07: Transactions of the High Performance Computing Center, Stuttgart (HLRS) 2007, Berlin/Heidelberg: Springer, 2008, p. 303-318
ISBN: 9783540747383
DOI: 10.1007/978-3-540-74739-0_21
URL: http://link.springer.com/chapter/10.1007%2F978-3-540-74739-0_21
Donath S., Iglberger K., Wellein G., Zeiser T., Nitsure A., Rüde U.:
Performance comparison of different parallel lattice Boltzmann implementations on multi-core multi-socket systems
In: International Journal of Computational Science and Engineering 4 (2008), p. 3-11
ISSN: 1742-7185
DOI: 10.1504/IJCSE.2008.021107
URL: https://www10.informatik.uni-erlangen.de/Publications/Papers/2008/Donath_IJCSE_4_1.pdf
Hager G., Zeiser T., Wellein G.:
Data access characteristics and optimizations for SUN ULTRASPARC T2 AND T2+ systems
In: Parallel Processing Letters 18 (2008), p. 471-490
ISSN: 0129-6264
DOI: 10.1142/S0129626408003521
Hager G., Zeiser T., Wellein G.:
Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers
IEEE International Symposium on Parallel and Distributed Processing, 2008. IPDPS 2008 (Miami, FL, USA, 2008-04-14 - 2008-04-18)
In: Proceedings of the 2008 IEEE International Parallel & Distributed Processing Symposium, IEEE Catalog Number: 2008
DOI: 10.1109/IPDPS.2008.4536341
Wellein G., Hager G., Rüde U.:
What's next? Evaluating Performance and Programming Approaches for Emerging Computer Technologies
(2008), p. 42-45
URL: http://www.rrze.uni-erlangen.de/wir-ueber-uns/publikationen/HPC-2008-Screenversion.pdf
(Techreport)
Zeiser T., Wellein G., Iglberger K., Rüde U., Hager G., Nitsure A.:
Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method
In: Progress in Computational Fluid Dynamics 8 (2008), p. 179-188
ISSN: 1468-4349
DOI: 10.1504/PCFD.2008.018088

2007

Bergen B., Wellein G., Hülsemann F., Rüde U.:
Hierarchical hybrid grids: achieving TERAFLOP performance on large scale finite element simulations
In: International Journal of Parallel, Emergent and Distributed Systems 22 (2007), p. 311-329
ISSN: 1744-5760
DOI: 10.1080/17445760701442218

2006

Hager G., Zeiser T., Eitzinger J., Wellein G.:
Optimizing performance on modern HPC systems: learning from simple kernel benchmarks
The 2nd Russian-German Advanced Research Workshop (Stuttgart, Germany)
In: Computational Science and High Performance Computing II, Berlin Heidelberg: 2006
DOI: 10.1007/3-540-31768-6_23
URL: http://www.springerlink.com/content/8401n54088177483/
Iglberger K., Rüde U., Feichtinger C., Wellein G., Hager G., Nitsure A.:
Optimization of Cache Oblivious Lattice Boltzmann Method in 2D and 3D
ASIM 2006 - 19. Symposium Simulationstechnik (Hannover)
In: Simulationstechnique - 19th Symposium in Hannover, September 2006, Erlangen: 2006
URL: https://www10.informatik.uni-erlangen.de/Publications/Papers/2006/Nitsure_ASIM06.pdf
Lammers P., Wellein G., Zeiser T., Hager G., Breuer M.:
Have the Vectors the Continuing Ability to Parry the Attack of the Killer Micros?
In: High Performance Computing on Vector Systems: Proceedings of the High Performance Computing Center Stuttgart, March 2005, Berlin Heidelberg: Springer, 2006, p. 25-37 (Mathematics and Statistics, Vol.1)
ISBN: 978-3-540-29124-4
DOI: 10.1007/3-540-35074-8_2
Wellein G., Lammers P., Hager G., Donath S., Zeiser T.:
Towards optimal performance for lattice Boltzmann applications on terascale computers
In: Parallel Computational Fluid Dynamics: Theory and Applications, Proceedings of the 2005 International Conference on Parallel Computational Fluid Dynamics, -: 2006
DOI: 10.1016/B978-044452206-1/50005-7
Wellein G., Zeiser T., Hager G., Donath S.:
On the single processor performance of simple lattice Boltzmann kernels
In: Computers & Fluids 35 (2006), p. 910-919
ISSN: 0045-7930
DOI: 10.1016/j.compfluid.2005.02.008

2005

Deserno F., Hager G., Brechtefeld F., Wellein G.:
Performance of Scientific Applications on Modern Supercomputers
High Performance Computing in Science and Engineering (München)
In: High Performance Computing in Science and Engineering Munich 2004 Transactions of the Second Joint HLRB and KONWIHR Status and ResultWorkshop, March 2-3, 2004, Technical University of Munich, andLeibniz-Rechenzentrum Munich, Germany., Berlin Heidelberg: 2005
URL: http://link.springer.com/chapter/10.1007/3-540-26657-7_1#page-1
Donath S., Zeiser T., Hager G., Habich J., Wellein G.:
Optimizing performance of the lattice Boltzmann method for complex structures on cache-based architectures
In: Frontiers in Simulation: Simulationstechnique, 18th Symposium in Erlangen, September 2005 (ASIM), Erlangen: 2005
URL: http://www.rrze.uni-erlangen.de/dienste/arbeiten-rechnen/hpc/Projekte/Donath_ASIM05.pdf
Hager G., Bergen B., Lammers P., Wellein G.:
Taming the Bandwidth Behemoth. First Experiences on a Large SGI Altix System
In: Innovatives Supercomputing in Deutschland : inSiDE 3 (2005), p. 24

2004

Pohl T., Thürey N., Deserno F., Rüde U., Lammers P., Wellein G., Zeiser T.:
Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures.
(2004)
URL: https://www10.cs.fau.de/publications/reports/TechRep_2004-02.pdf
(Techreport)
Pohl T., Thürey N., Deserno F., Rüde U., Lammers P., Wellein G., Zeiser T.:
Performance Evaluation of Parallel Large-Scale Lattice Boltzmann Applications on Three Supercomputing Architectures
Supercomputing Conference '04 (Pittsburgh, PA, USA, 2004-11-06 - 2004-11-12)
In: Friedrich-Alexander-Universität Erlangen-Nürnberg (ed.): Supercomputing, 2004: Proceedings of the ACM/IEEE SC2004 Conference 2004
DOI: 10.1109/SC.2004.37
URL: http://dl.acm.org/ft_gateway.cfm?id=1049965&ftid=310446&dwn=1&CFID=475065051&CFTOKEN=96852916
Zeiser T., Wellein G., Lammers P.:
Is there still a need for tailored HPC systems or can we go with commodity off-the-shelf clusters - Some comments based on performance measurements using a lattice Boltzmann flow solver
In: Innovatives Supercomputing in Deutschland : inSiDE 2 (2004), p. 10-15

2003

Hager G., Deserno F., Wellein G.:
Pseudo-Vectorization and RISC Optimization Techniques for the Hitachi SR8000 Architecture
High Performance Computing in Science and Engineering (München, 2002-10-10 - 2002-10-11)
In: High Performance Computing in Science and Engineering, Munich 2002: Transactions of the First Joint HLRB and KONWIHR Status and Result Workshop, October 10-11, Technical University of Munich, Germany., New York, LLC: 2003

Prof. Dr. Gerhard Wellein

Research fields

Lectures in the current semester

List of current projects

FOSTERING THE EUROPEAN ENERGY TRANSITION WITH EXASCALE

Quelloffene Lösungsansätze für Monitoring und Systemeinstellungen für energieoptimierte Rechenzentren

Der skalierbare Strömungsraum

DatenREduktion für Exascale- Anwendungen in der Fusionsforschung

Weiterentwicklung des Hochleistungsrechnens

List of recent publications

2026

2025

2024

2023

2022

2021

2020

List of talks

2024

2023

2022

Previous projects (2009—2019)

2019

Energy Oriented Center of Excellence: toward exascale for energy (Performance evaluation, modelling and optimization)

2017

Metaprogrammierung für Beschleunigerarchitekturen

Process-Oriented Performance Engineering Service Infrastructure for Scientific Software at German HPC Centers

Selbstadaption für zeitschrittbasierte Simulationstechniken auf heterogenen HPC-Systemen

2016

SPP EXA 1648

EXASTEEL II - Bridging Scales for Multiphase Steels

Equipping Sparse Solvers for Exascale II (ESSEX-II)

Bridging scales - from Quantum Mechanics to Continuum Mechanics. A Finite Element approach.

Ultra-Skalierbare Multiphysiksimulationen für Erstarrungsprozesse in Metallen

2012

ESSEX - Equipping Sparse Solvers for Exascale

ESSEX - Equipping Sparse Solvers for Exascale

EXASTEEL - Bridging Scales for Multiphase Steels

TERRA-NEO - Integrated Co-Design of an Exascale Earth Mantle Modeling Framework

2011

Eine fehlertolerante Umgebung für peta-scale MPI-Löser

2009

SKALB: Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen

Older publications (2003—2019)

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003