BEGIN:VCALENDAR
VERSION:2.0
PRODID:researchseminars.org
CALSCALE:GREGORIAN
X-WR-CALNAME:researchseminars.org
BEGIN:VEVENT
SUMMARY:Albert Cohen (Google)
DTSTART:20211015T150000Z
DTEND:20211015T160000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/1
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/1/">Herding Tensor Compilers</a>\nby Albert Cohen (Google) as 
 part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nThe orchestra
 tion of high-performance numerical computations on distributed and heterog
 eneous systems is not getting any simpler. In the last 5 years\, driven by
  the needs of machine learning\, systems and compilers made tremendous pro
 gress towards hiding this complexity while delivering excellent performanc
 e. These undeniable successes of computing systems and programming languag
 e research also came with undesirable and somewhat paradoxical side effect
 s: abstractions and engineering frameworks diversifying out of control whi
 le machine learning models got stuck in the rut defined by a small set of 
 highly optimized operators. We will recall algebraic principles supporting
  the compilation of tensor algebra\, and illustrate these principles on th
 ree optimization strategies with different degrees of human/expert interve
 ntion. While the presentation focuses on optimization and algorithms\, we 
 will also discuss MLIR\, a large-scale compiler construction effort to rat
 ionalize the landscape of machine learning systems.\n\nBio: Albert is a re
 search scientist at Google. He has been a research scientist at Inria from
  2000 to 2018. Alumni of École Normale Supérieure de Lyon and the Univer
 sity of Versailles in 1999. He has been a visiting scholar at the Universi
 ty of Illinois\, an invited professor at Philips Research\, and a visiting
  scientist at Facebook Artificial Intelligence Research. Albert Cohen work
 s on parallelizing and optimizing compilers\, parallel programming languag
 es and systems\, machine learning compilers\, synchronous programming\, wi
 th applications to high-performance computing\, artificial intelligence an
 d reactive control. He served as the general or program chair of major con
 ferences\, including PLDI\, PPoPP\, PACT\, HiPEAC\, CC\, the embedded soft
 ware track of DAC\, and as a member of the editorial board of ACM TACO\, T
 ECS and IJPP. Several research projects initiated by Albert Cohen resulted
  in effective transfer to production compilers and programming environment
 s.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/1/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Jonathan Ragan-Kelley (MIT)
DTSTART:20211022T150000Z
DTEND:20211022T160000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/2
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/2/">Organizing Computation for High-Performance Visual Computi
 ng</a>\nby Jonathan Ragan-Kelley (MIT) as part of Oxford Seminars on Tenso
 r Computation\n\n\nAbstract\nIn the face of declining returns to Moore’s
  law\, future visual computing applications—from photorealistic real-tim
 e rendering\, to 4D light field cameras\, to pervasive sensing with deep l
 earning—still demand orders of magnitude more computation than we curren
 tly have. From data centers to mobile devices\, performance and energy sca
 ling is limited by locality (the distance over which data has to move\, e.
 g.\, from nearby caches\, far away main memory\, or across networks) and p
 arallelism. Because of this\, I argue that we should think of the performa
 nce and efficiency of an application as determined not just by the algorit
 hm and the hardware on which it runs\, but critically also by the organiza
 tion of its computations and data. For algorithms with the same complexity
 —even the exact same set of arithmetic operations—the order and granul
 arity of execution and placement of data can easily change performance by 
 an order of magnitude because of locality and parallelism. To extract the 
 full potential of our machines\, we must treat the organization of computa
 tion as a first-class concern\, while working across all levels\, from alg
 orithms and data structures\, to programming languages\, to hardware.\n\nT
 his talk will present facets of this philosophy in systems for image proce
 ssing\, 3D graphics\, and machine learning. I will show that\, for the dat
 a-parallel pipelines common in these data-intensive applications\, the org
 anization of computations and data for a given algorithm is constrained by
  a fundamental tension between parallelism\, locality\, and redundant comp
 utation of shared values. I will focus particularly on the Halide language
  and compiler\, which explicitly separates what computations define an alg
 orithm from the choices of organization which determine parallelism\, loca
 lity\, and synchronization. I will show how this approach can enable much 
 simpler programs to deliver performance often many times faster than the b
 est prior implementations\, while scaling across radically different archi
 tectures\, from ARM phones to massively parallel GPUs\, FPGAs\, and custom
  ASICs.\n\nBio: Jonathan Ragan-Kelley is the Esther and Harold E. Edgerton
  Assistant Professor of Electrical Engineering & Computer Science at MIT a
 nd assistant professor of EECS at UC Berkeley.  He works on high-efficienc
 y visual computing\, including systems\, compilers\, and architectures for
  image processing\, vision\, 3D rendering\, simulation\, and machine learn
 ing. He is a recipient of the ACM SIGGRAPH Significant New Researcher awar
 d\, NSF CAREER award\,  Intel Outstanding Researcher award\, and two CACM 
 Research Highlights. He was previously a visiting researcher at Google\, a
  postdoc in Computer Science at Stanford\, and earned his PhD in Computer 
 Science from MIT in 2014. He co-created the Halide language and has built 
 more than a half-dozen other DSL and compiler systems\, the first of which
  was a finalist for an Academy technical achievement award.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/2/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Conal Elliott
DTSTART:20211029T150000Z
DTEND:20211029T160000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/3
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/3/">Can Tensor Programming Be Liberated from the Fortran Data 
 Paradigm?</a>\nby Conal Elliott as part of Oxford Seminars on Tensor Compu
 tation\n\nAbstract: TBA\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/3/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Markus Püschel (ETH Zürich)
DTSTART:20211105T160000Z
DTEND:20211105T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/9
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/9/">Program Generation for Small-Scale Linear Algebra</a>\nby 
 Markus Püschel (ETH Zürich) as part of Oxford Seminars on Tensor Computa
 tion\n\n\nAbstract\nMany performance-critical computations in communicatio
 n\, control\, multimedia processing\, machine learning\, or graphics fall 
 into the domain of linear algebra. Existing optimized libraries for linear
  algebra are usually optimized for large scale computation and for uses in
  scientific computing. For small scale computations in other domains they 
 are often suboptimal. In this talk I present our work on generating optimi
 zed linear algebra code directly from a mathematical description using tec
 hniques developed in Spiral (www.spiral.net): layers of domain-specific la
 nguages to express the mathematics and the use of rewriting systems to res
 hape the computation at a high level of abstraction to overcome known comp
 iler limitations. (This is the thesis work of Daniele Spampinato\; project
  website: https://acl.inf.ethz.ch/research/LGen/.)\n\nBio: Markus Püschel
  is a Professor and former Department Head of Computer Science at ETH Zür
 ich\, Switzerland. Before\, he was a Professor of Electrical and Computer 
 Engineering at Carnegie Mellon University\, where he still has an adjunct 
 status. He is an IEEE Fellow. One of his longstanding interests is automat
 ing the production of high performance software and hardware designs for m
 athematical functionality as exemplified by the Spiral project. Besides th
 is\, his current interests include program generation\, novel forms of Fou
 rier analysis and its applications\, machine learning\, and program analys
 is. For more information\, please visit https://acl.inf.ethz.ch/.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/9/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Rohan Yadav (Stanford)
DTSTART:20211119T160000Z
DTEND:20211119T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/11
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/11/">DISTAL: The Distributed Tensor Algebra Compiler</a>\nby R
 ohan Yadav (Stanford) as part of Oxford Seminars on Tensor Computation\n\n
 \nAbstract\nWe introduce DISTAL\, a compiler for dense tensor algebra that
  targets modern distributed and heterogenous systems. DISTAL allows users 
 to independently describe how tensors and computation map onto the target 
 machine through the tensors’ formats and a scheduling language. The comb
 ination of choices for data and computation distribution creates a design 
 space that includes algorithms from the past (Cannon’s algorithm) and pr
 esent (COSMA). DISTAL compiles a tensor algebra domain specific language t
 o a distributed task-based runtime system and supports nodes with multi-co
 re CPUs and multiple GPUs. Code generated by DISTAL is competitive with op
 timized codes for matrix multiply on 256 nodes of the Lassen supercomputer
  and outperforms existing systems by between 1.8$\\times$ to 3.7$\\times$ 
 (with a 45.7$\\times$ outlier) on higher order tensor operations.\n\nBio: 
 Rohan Yadav is a second year computer science PhD student at Stanford Univ
 ersity\, advised by Alex Aiken and Fredrik Kjolstad. He is generally inter
 ested in programming languages and computer systems\, with a particular fo
 cus in systems for parallel and distributed computing.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/11/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Martin Elsman (Copenhagen)
DTSTART:20211126T160000Z
DTEND:20211126T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/12
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/12/">Size-Dependent Types for Practical Data-Parallel Programm
 ing</a>\nby Martin Elsman (Copenhagen) as part of Oxford Seminars on Tenso
 r Computation\n\n\nAbstract\nWe present a type system for expressing size 
 constraints on array\ntypes in an ML-style type system.  The goal is to de
 tect shape\nmismatches at compile-time without having to deal with all the
 \nconsequences of a full dependent type system.  The main restrictions\nar
 e that the only terms that can occur in types are array sizes\, which\nare
  constrained syntactically to be either variables or constants.\nFor expre
 ssions that result in arrays of sizes that are not\nexpressible using thes
 e restrictions\, the system supports a form of\nexistential types\, with t
 he type system automatically managing the\nrequisite book-keeping\, while 
 guaranteeing that\, at runtime\, all\narrays are regular.\n\nThe type syst
 em forms the basis of the type system for Futhark\, a\ndata-parallel funct
 ional language (and compiler)\, which is aimed at\ngenerating efficient pa
 rallel code for GPUs and multi-threaded CPUs.\nFuthark is equipped with a 
 number of first- and second-order array\ncombinators\, which have data-par
 allel semantics.  Futhark performs a\nnumber of fusion\, tiling and flatte
 ning transformations\, and may even\ngenerate multiple code versions that 
 are dispatched dynamically based\non auto-tuned size-aspects of input data
 .  The size-dependent type\nsystem works well with Futhark's support for h
 igher-order modules and its\nlimited form of higher-order functions\, whic
 h are all eliminated\nentirely at compile time.  We give examples of libra
 ry functions and\ndata structures that utilise the features of size-depend
 ent types to\nexpress the intentions of how functions are used\, for insta
 nce\, to\navoid out-of-bounds array-index errors and to guard against the\
 ncomposition of incompatible neural network layers.\n\nFuthark is joint wo
 rk with a number of researchers at DIKU\, including\nTroels Henriksen (DIK
 U)\, Cosmin E. Oancea\, Ken Friis Larsen\, Fritz\nHenglein\, and Philip Mu
 nksgaard.\n\nBio:  	\nMartin Elsman conducts research in the design and im
 plementation of\nprogramming languages.  Areas of research include compila
 tion\ntechniques for functional languages\, in particular with focus on\np
 arallel languages\, module systems\, Web technology\, program analyses\nfo
 r memory management\, program optimisation\, and domain-specific\nlanguage
 s for financial contracts.  Martin is Professor in the\nProgramming Langua
 ges and Theory of Computation section at Department\nof Computer Science\,
  University of Copenhagen (DIKU)\, where he serves\nas head of studies for
  a BSc education on Computer Science and\nEconomics.  Martin is also an ac
 tive maintainer of several software\ntools\, including the MLKit\, a full-
 blown Standard ML compiler\, which\ntargets both JavaScript and x86-64 mac
 hine code.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/12/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Dimitrios Vytiniotis (DeepMind)
DTSTART:20211203T160000Z
DTEND:20211203T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/13
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/13/">Automating Tensor Program Partitioning on Accelerator Sys
 tems with PartIR</a>\nby Dimitrios Vytiniotis (DeepMind) as part of Oxford
  Seminars on Tensor Computation\n\n\nAbstract\nThe rapid rise in demand fo
 r training large neural networks has brought into focus the need for parti
 tioning across systems of accelerator devices. Implementing various forms 
 of partitioning is increasingly supported through program primitives\, but
  identifying efficient partitioning strategies requires expensive experime
 ntation and expertise. We present the prototype of an automated partitioni
 ng system that integrates into existing compilers and existing user workfl
 ows. Our system relies on layering functional loop abstractions – that r
 eturn or reduce over chunks of arrays – on top of an arbitrary array “
 dialect” (following the MLIR terminology) such as XLA. We use rewrite ru
 les reminiscent of fusion rules from stream fusion to express various form
 s of propagation of partitioning information across a program. Our system 
 compiles functional loops to SPMD abstractions in a lower-level dialect wh
 ose types capture distributed arrays and which includes explicit array red
 istribution commands. This dialect can then be lowered\, compiled\, and ex
 ecuted using the “native” backend compiler and runtime (e.g. XLA) in a
  device-agnostic manner. We will present the design of a search environmen
 t controlling the actions of our rewrite engine that is specifically aimin
 g to tame the size of search space by (a) mimicking the way expert program
 mers would attempt to partition their programs and (b) exploiting high-lev
 el model structure already available in popular libraries for neural netwo
 rks. We show promising initial results\, such as the ability to automatica
 lly recover good partitioning for important neural network architectures\;
  and we outline remaining challenges.\n\nBio: Dimitrios Vytiniotis is a re
 search scientist leading the research in programming languages and machine
  learning systems at DeepMind. He holds a PhD from the University of Penns
 ylvania (2008) and was a researcher with Microsoft Research Cambridge unti
 l 2018. His interests span functional programming and type systems\, and m
 ore broadly language design and implementation\, with applications in area
 s like systems and machine learning.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/13/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Alex Aiken (Stanford University)
DTSTART:20220121T160000Z
DTEND:20220121T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/14
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/14/">Legion: Programming Distributed Heterogeneous Architectur
 es</a>\nby Alex Aiken (Stanford University) as part of Oxford Seminars on 
 Tensor Computation\n\n\nAbstract\nProgrammers tend to think of parallel pr
 ogramming as a problem of\ndividing up computation\, but increasingly the 
 most important decisions\ninvolve the partitioning\, placement and movemen
 t of data.  Legion is a\ndata-centric task-based programming model for the
  development of\ncomposable and portable software on distributed\, heterog
 eneous\narchitectures.  The Legion model is built around two core features
 : a\ndata model that allows users to dynamically describe the structure of
 \nprogram data and a suite of partitioning operators for describing the\ns
 ubsets of data used by tasks.  Leveraging its detailed knowledge of\nprogr
 am data\, the Legion runtime uses dynamic dependence analysis to\nautomati
 cally infer implicit parallelism\, data movement\, and\nsynchronization. A
  separate mapping interface decouples Legion\nprograms from how they are s
 cheduled onto individual machines\, making\nLegion programs easy to port. 
  We will give several examples of how\nLegion is used for accelerating bot
 h HPC and machine learning\nworkloads at scale.\n\nBio: Alex Aiken is the 
 Alcatel-Lucent Professor of Computer Science at Stanford. Alex received hi
 s Bachelors degree in Computer Science and Music from Bowling Green State 
 University in 1983 and his Ph.D. from Cornell University in 1988. Alex was
  a Research Staff Member at the IBM Almaden Research Center (1988-1993) an
 d a Professor in the EECS department at UC Berkeley (1993-2003) before joi
 ning the Stanford faculty in 2003. His research interest is in areas relat
 ed to programming languages. He is an ACM Fellow\, a recipient of ACM SIGP
 LAN's Programming Languages Achievement Award and Phi Beta Kappa's Teachin
 g Award\, and a former chair of the Stanford Computer Science Department.\
 n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/14/
END:VEVENT
BEGIN:VEVENT
SUMMARY:David Ham (Imperial College)
DTSTART:20220128T160000Z
DTEND:20220128T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/15
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/15/">Automating Finite Element Simulation by Generating Tensor
  Computations from Vector Calculus</a>\nby David Ham (Imperial College) as
  part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nThe simulati
 on of continuous physical systems described by Partial Differential Equati
 ons (PDEs) has been and continues to be one of the great challenges of sci
 entific computing. From nanomaterials to the weather forecast\, the abilit
 y to simulate and optimise continuous systems underpins much of science an
 d engineering. From a software perspective\, the creation of simulation to
 ols requires the complex manipulation of the PDEs involved\, then their di
 scretisation\, and finally the optimal scheduling of the resulting calcula
 tion. In this talk I will show how the various stages of this tool creatio
 n process can be modelled as tensor computations\, and that each stage can
  be automatically generated from the previous one using specialised compil
 er technology. The result is that scientists and engineers can formulate a
 dvanced numerical methods for ever-changing PDEs\, and have high performan
 ce computational tools generated automatically. This brings both productiv
 ity and performance to the simulation problem\, enabling scientists to und
 ertake work that would previously have exceeded their human and computatio
 nal resources.\n\nBio: Dr David Ham is a reader in Computational Mathemati
 cs at Imperial College London. He has degrees in mathematics and law from 
 the Australian National University\, and a doctorate in numerical methods 
 for PDEs from TU Delft. His research focusses on automating the finite ele
 ment method\, and focuses on the Firedrake automated simulation system. He
  received the 1995 Wilkinson Prize for Numerical Software for his automati
 on of inverse finite element simulation. Dr Ham co-leads the joint mathema
 tics and computer science degree programme at Imperial College London\, an
 d founded and leads the Mary Lister McCammon Summer Research Fellowship fo
 r Women in Mathematics and Statistics. He is the chief executive editor of
  the European Geosciences Union journal Geoscientific Model Development.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/15/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Sven-Bodo Scholz (Radboud University Nijmegen)
DTSTART:20220204T160000Z
DTEND:20220204T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/16
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/16/">Tensor Comprehensions in SaC: A Minimalistic Notation for
  High-Performance Computing</a>\nby Sven-Bodo Scholz (Radboud University N
 ijmegen) as part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nT
 his talk focusses on programmer productivity when it comes to defining\,\n
 understanding\, and maintaining computations on multi-dimensional arrays.\
 nShape-invariant programming\, i.e.\, the ability to define APL-like opera
 tors that\ncan be applied to arrays of arbitrary dimensionality\, surely c
 onstitutes a key\nelement here. This raises the question what the minimal 
 building blocks for\nsuch operators should be. Should they be a set of fix
 ed primitives a la APL?\nShould they be a small set of higher-order operat
 ors? Should they be inherently\nn-dimensional or should they be one-dimens
 ional and then be applied recursively?\nShould they be loops?  Or do we ne
 ed all of these to conveniently express our\nalgorithms?\n\nIn the context
  of SaC\, we propose a new form of array comprehensions named\n"Tensor Com
 prehensions".  This notation strives to be flexible enough to allow\nfor a
 ll of the above-mentioned flavours. Despite this flexibility\, Tensor\nCom
 prehensions aim to be minimalistic in the syntactical requirements\, build
 ing\non sophisticated inference technology to enable programmers to leave 
 out many\n"obvious" parts.  The resulting notation comes rather close to t
 he so-called\nTensor Notations used in Physics and Mathematics.  As a resu
 lt\, complex\noperators with rich semantics can be defined more concisely 
 than before.\n\nBio: Sven-Bodo Scholz is Professor of Computer Science at 
 Radboud University\, Nijmegen\, Netherlands.\nHe also holds a professorshi
 p at Heriot-Watt University\, Edinburgh\, Scotland.\nHis research is drive
 n by the desire to bridge the gap between high-productivity programming to
 ols\nand high-performance heterogeneous many-core systems by means of comp
 ilation technology. Typical\napplication areas range from multi-sensor rob
 otics systems over big-data analytics to vision and\ncomputational science
 . Target systems range from embedded circuits over large clusters of\nGPU-
 accelerated systems into cloud infrastructures.\n\nMost of his work on par
 allelising compiler technology is driven by the needs of industrial projec
 t\npartners such as Intel\, AMD\, Thales\, SAP\, Philips and others. Besid
 es regular international\ndissemination in both\, academia and industry\, 
 his work has led to several systems in the public\ndomain most notably the
  SaC compiler tool-chain (www.sac-home.org).\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/16/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Adam Paszke (Google)
DTSTART:20220211T160000Z
DTEND:20220211T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/17
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/17/">Getting to the Point with Dex: Safe Parallel Programming 
 for Scientific Applications</a>\nby Adam Paszke (Google) as part of Oxford
  Seminars on Tensor Computation\n\n\nAbstract\nThe talk will be focused on
  the design of Dex\, both in terms of the surface syntax and its typing di
 scipline. Dex is a new domain specific programming language aiming to make
  it easier to implement parallel scientific computing workloads in a clear
  and safe way\, while being able to achieve the efficiency of low-level nu
 merical languages. The core idea underlying its design is the treatment of
  arrays as memoized representations of functions with finite domains\, all
 owing abstract function manipulations\, such as currying or abstraction\, 
 to work on arrays. Additionally\, instead of following the well-trodden pa
 th of bulk-array combinators\, we argue for a programming style heavy on e
 xplicit array indexing\, that closely mirrors function applications. We as
 sociate the classical bulk-array programming with “pointfree” style of
  functional programming and try to rebuild the array paradigm in an (argua
 bly more popular) “pointful” style instead. For increased expressivene
 ss and efficiency (especially under automatic differentiation)\, we additi
 onally extend the language with a fine-grained effect system that allows u
 s to reason about performance in a type-directed way.\n\nBio: Adam Paszke 
 is a Senior Research Scientist at Google\, based in Warsaw\, Poland. His w
 ork focuses on automatic differentiation\, parallelism-friendly programmin
 g languages for scientific computing\, and partitioning of those for purpo
 ses of distributed execution. Before Google\, he worked with Facebook and 
 authored PyTorch. He graduated in Computer Science and Mathematics from th
 e University of Warsaw.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/17/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Oleg Kiselyov (Tohoku University)
DTSTART:20220218T160000Z
DTEND:20220218T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/18
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/18/">Even Better Stream Fusion</a>\nby Oleg Kiselyov (Tohoku U
 niversity) as part of Oxford Seminars on Tensor Computation\n\n\nAbstract\
 nStream processing is one of the key data processing modes\, related to da
 taflow programming. It was dominant in the punch-card era\, and is becomin
 g prevalent again\, in the era of huge data\, ubiquitous sensors and distr
 ibuted computing. Its characteristic is incremental\, sequential processin
 g with bounded buffering\, which lets one handle possibly unbounded amount
  of data in limited space. Another characteristic is the ease of specifyin
 g it as a Xmas-lights diagram: if some further processing is needed\, just
  plug in another segment.\n\nAlthough the diagrams are easy to draw\, they
  are difficult to implement with low latency and in low memory. This talk 
 is about the key optimization: stream fusion\, which is combining several 
 simple processing steps into one complex step\, reducing the amount of int
 ermediary data and communication overhead. Specifically\, we will talk abo
 ut complete fusion: not just reduction but complete elimination. This is h
 ard\, especially for diagrams with "fat pipes" (flatmap) and "joins" (zip)
 .\n\nThis talk introduces the ongoing work on strymonas\, which is a high-
 performance code generation library (DSL) that converts a diagram-like spe
 cification into hand-written-like code -- with assured complete fusion. We
  describe the main ideas behind the complete fusion of diagrams with joins
 \, and illustrate on the example of the software FM radio.\n\nBio: Oleg Ki
 selyov is an Assistant Professor at Tohoku University in Japan. He got int
 erested in stream processing when automating scientific instruments (calor
 imeters and neuron activity recording) 35 years ago. In 1990s he wrote and
  maintained a C++ linear algebra library based on streams rather than arra
 ys. Later on he wrote a streaming XML parser\, still used in Scheme commun
 ity\, and designed Iteratees (see Wikipedia). His latest interest is gener
 ating fast stream processing code.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/18/
END:VEVENT
BEGIN:VEVENT
SUMMARY:James Demmel (University of Berkeley)
DTSTART:20220225T160000Z
DTEND:20220225T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/19
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/19/">Communication-Avoiding Algorithms for Linear Algebra\, Ma
 chine Learning and Beyond</a>\nby James Demmel (University of Berkeley) as
  part of Oxford Seminars on Tensor Computation\n\n\nAbstract\nAlgorithms h
 ave two costs: arithmetic and communication\, i.e. moving data between lev
 els of a memory hierarchy or processors over a network. Communication cost
 s (measured in time or energy per operation) greatly exceed arithmetic cos
 ts\, so our goal is to design algorithms that minimize communication. We s
 urvey some known algorithms that communicate asymptotically less than thei
 r classical counterparts\, for a variety of linear algebra and machine lea
 rning problems\, often attaining lower bounds. We also discuss recent work
  on automating the design and implementation of these algorithms\, startin
 g from a simple specification as nested loops.\n\nBio: James Demmel is the
  Dr. Richard Carl Dehmel Distinguished Professor of Computer Science and M
 athematics at the University of California at Berkeley\, and former Chair 
 of the EECS Dept.  His research is in numerical linear algebra\, high perf
 ormance computing\, and communication avoiding algorithms. He is known for
  his work on the widely used LAPACK and ScaLAPACK linear algebra libraries
 .  He is a member of the National Academy of Sciences\, National Academy o
 f Engineering\, and American Academy of Arts and Sciences\; a Fellow of th
 e AAAS\, ACM\, AMS\, IEEE and SIAM\; and winner of the IPDPS Charles Babba
 ge Award\, IEEE Computer Society Sidney Fernbach Award\, the ACM Paris Kan
 ellakis Award\, and numerous best paper prizes.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/19/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Mike Giles (University of Oxford)
DTSTART:20220304T160000Z
DTEND:20220304T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/20
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/20/">Some Reflections on Automated Code Generation</a>\nby Mik
 e Giles (University of Oxford) as part of Oxford Seminars on Tensor Comput
 ation\n\n\nAbstract\nIn this talk from a workshop 8 years ago\, I reflect 
 on a number of projects I was involved in\, or aware of\, at that time.  T
 he common feature was the desire to simplify high performance computing th
 rough abstraction\, separating the specification of what was to be compute
 d from the details of how it was computed.  In practice\, this involved au
 tomated code generation\, either through the processing of embedded DSLs\,
  or through the creation of application-specific DSLs with custom code gen
 eration backends.\n\nBio: Mike Giles is a Professor of Scientific Computin
 g and currently head of the Mathematical Institute\; from 1992 to 2008 he 
 was in Oxford's Computer Science department which was then called the Comp
 uting Laboratory.  His primary research interests are in the development a
 nd analysis of a wide variety of numerical algorithms\, but a secondary in
 terest is in various aspects of high performance computing.  This included
  being one of the UK's early pioneers in GPU computing\, and led to him es
 tablishing the Emerald and JADE GPU supercomputers.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/20/
END:VEVENT
BEGIN:VEVENT
SUMMARY:Gabriele Keller (Utrecht University)
DTSTART:20220311T160000Z
DTEND:20220311T170000Z
DTSTAMP:20260602T225749Z
UID:OxfordTensorComputation/21
DESCRIPTION:Title: <a href="https://researchseminars.org/talk/OxfordTensor
 Computation/21/">Accelerate: High-Performance Computing in Haskell</a>\nby
  Gabriele Keller (Utrecht University) as part of Oxford Seminars on Tensor
  Computation\n\n\nAbstract\nThis talk presents Accelerate\, a data-paralle
 l programming language embedded in Haskell\, with multi-core CPU and GPU b
 ackends. In Accelerate\, data parallelism is expressed through a set of fi
 rst and second order functions operating on (possibly multi-dimensional) a
 rrays\, where parallel and sequential operations are distinguished through
  types. This statically excludes irregular nested data parallel operations
 \, which the compiler currently cannot efficiently map to the target archi
 tecture. We will discuss how Accelerate is positioned in the space of comp
 arable languages and present some of the core ideas underlying the impleme
 ntation of Accelerate and its embedding in the host language\, including t
 he type system of the language. Furthermore\, we provide a summary of curr
 ent projects and where we are planning to take the language in the near fu
 ture.\n\nBio: Gabriele Keller is the chair of the Software Technology Grou
 p at Utrecht University in the Netherlands. Before moving to Utrecht\, she
  was an Associate Professor at University of New South Wales in Sydney\, A
 ustralia\, where she co-founded the Programming Language and Systems Group
 . Her research interests are in programming languages\, in particular func
 tional languages and languages for high-performance computing\, as well as
  verified compilation of programming languages.\n
LOCATION:https://researchseminars.org/talk/OxfordTensorComputation/21/
END:VEVENT
END:VCALENDAR