Wednesday 14:20
in None
Material
https://github.com/ikrommyd/2026-04-15-pyconde-and-pydata-2026-tutorial-array-oriented-programming
What you need: Your laptop, and the repository cloned and the environment set up as explained in the README. Alternatively, an internet connection during the tutorial to set up the environment live or follow along on MyBinder.
The setup is needed to do the problems/puzzles which are part of the tutorial.
Overview
Python's dominance in scientific computing and data science stems from its powerful array libraries that enable high-performance numerical computation. This 90-minute tutorial introduces array-oriented programming as a paradigm and surveys the modern Python array ecosystem, helping you understand which tools to use and when.
What is Array-Oriented Programming?
Array-oriented programming is a paradigm that separates problems into lightweight Python bookkeeping and heavy numerical computation handled by vectorized operations in fast, precompiled libraries. We'll demonstrate how this approach combines Python's ease of use with near-compiled-language performance.
Through live examples, you'll see how array operations can be orders of magnitude faster than explicit loops. This mindset shift—thinking about operations on entire arrays rather than individual elements—is fundamental to effective scientific Python programming.
The Array Library Landscape
We'll survey the modern Python array ecosystem and when to use each tool:
- NumPy: The foundation for general-purpose array operations
- Numba & JAX: JIT compilation approaches—when and why to use each
- Awkward Array: Handling nested and ragged data structures
- Large dataset tools: Brief overview of Dask, Xarray, Zarr, and Blosc2 for distributed computing, labeled arrays, and compression
We'll demonstrate the strengths and limitations of each through live coding examples, showing trade-offs between different approaches.
Understanding Limitations and Trade-offs
A critical part of choosing the right tool is understanding when array-oriented programming has limitations. We'll discuss challenges like intermediate array overhead and algorithms that don't naturally vectorize, and show how different libraries address these problems.
What You'll Learn
By the end of this tutorial, you will:
- Understand array-oriented programming as a paradigm and how it differs from imperative programming
- Know which library to choose for different problems: NumPy vs. Numba vs. JAX vs. specialized tools
- Recognize when array-oriented approaches have limitations and how to address them with JIT compilation
- Handle non-rectilinear data using libraries like Awkward Array
- Work with large datasets using chunking, compression, and labeled arrays
- Write more performant Python code by applying array-oriented thinking to your own problems
Prerequisites
Familiarity with Python (loops, functions, if statements) and basic NumPy exposure (what an array is and how to use it). No deep expertise required.
Target Audience
Data scientists, researchers, and engineers who want to write more efficient Python code, understand the modern array ecosystem, or choose the right tools for their problems.
Outline
- 0:00‒0:10 (10 min) Lecture 1: Array-oriented programming and its benefits. Simple and complex (3 body problem) examples of imperative, functional, and array-oriented styles. Speed and memory advantages in Python. What the array-oriented paradigm emphasizes/is good for: interactive analyses of distributions. Path length as a worked example.
- 0:10‒0:25 (15 min) NumPy puzzles and solutions. Alternating between hands-on puzzles and walkthrough of solutions: array slicing, consecutive differences, curve length, and image downscaling with reshape.
- 0:25‒0:35 (10 min) Lecture 2: Disadvantages of array-oriented programming. (1) The problem of intermediate arrays, shown using the quadratic formula, with timing, compared to pre-compiled C code. (2) The “iterate until converged” problem, shown using a one-dimensional minimizer (Newton’s method) for an array of initial states; talk about epochs in ML.
- 0:35‒0:45 (10 min) Lecture 3: JIT-compilation with Numba and JAX. Describe JIT-compilation as the solution to the intermediate array problem (1). First Numba then JAX on the quadratic formula. Show that Numba only accelerates if you write imperative code, unlike JAX, and show that JAX can’t follow if-branches or loops of unknown length.
- 0:45‒0:55 (10 min) Project 3: JIT-compilation of the Mandelbrot set. Walk through imperative Python, array-oriented NumPy, Numba, and JAX implementations with timings. Note that array-oriented programming is advantageous for GPU programming, even beyond Python.
- 0:55‒1:05 (10 min) Lecture 4: Ragged and deeply nested arrays. Show examples of ragged, nested, missing, and heterogeneous data, and how it can still make sense to treat them as arrays. Conversion to and from “tidy” data (tabular with references) to compare and contrast.
- 1:05‒1:20 (15 min) Lecture 5: Working with large datasets. Overview of tools for chunking, compression, and labeled arrays: Dask, Zarr, Blosc2, and xarray.
- 1:20‒1:30 (10 min) Wrap-up and Q&A.
Iason Krommydas
I'm a PhD student in the Department of Physics and Astronomy at Rice University, conducting research in high-energy physics as a member of the CMS experiment at the Large Hadron Collider at CERN. My work focuses on studying Higgs boson decays into two photons, analyzing data collected by the CMS detector, and contributing to software development for large-scale scientific analyses. I'm passionate about scientific computing and open-source tools that enable reproducible and efficient research. I’m maintainer of Awkward Array, an array library for nested, variable-sized data, using NumPy-like idioms, and an author and maintainer of Coffea, a toolkit designed to simplify data analysis in particle physics. With experience in the scientific Python ecosystem, I enjoy building tools that drive insight and accelerate scientific discovery.