Flattening transformation

From HandWiki

The flattening transformation is an algorithm that transforms nested data parallelism into flat data parallelism. It was pioneered by Guy Blelloch as part of the NESL programming language.[1] The flattening transformation is also sometimes called vectorization, but is completely unrelated to automatic vectorization. The original flattening algorithm was concerned solely with first-order multidimensional arrays containing primitive types, but was extended to handle higher-order and recursive data types in the work on Data Parallel Haskell.[2]

Overview

Flattening works by lifting functions to operate on arrays instead of on single values. For example, a function [math]\displaystyle{ f : A \rightarrow B }[/math] is lifted to a function [math]\displaystyle{ f' : [A] \rightarrow [B] }[/math]. This means an expression [math]\displaystyle{ map~f~x }[/math] can be replaced with an application of the lifted function: [math]\displaystyle{ f'~x }[/math]. Intuitively, flattening thus works by replacing all function applications with applications of the corresponding lifted function.

After flattening, arrays are represented as single-dimensional value vector V containing scalar elements, alongside auxiliary information recording the nested structure, typically in the form of a boolean flag vector F. The flag vector indicates, for the corresponding element in the value vector, whether it is the beginning of a new segment. For example, the two-dimensional irregular array [math]\displaystyle{ A=1,2,3], [4,5], [], [6 }[/math] can be represented as the data vector [math]\displaystyle{ V = [1,2,3,4,5,6,7] }[/math] alongside the flag vector [math]\displaystyle{ F = [1, 0, 0, 1, 0, 1, 1] }[/math].

This flag vector is necessary in order to correctly flatten nested parallelism. For example, it is used in the flattening of prefix sum to segmented scan.

Flattening can increase the asymptotic work and space complexity of the original program, leading to a much less efficient result.[3]

Usage

Flattening was originally developed for vector machines such as the Connection Machine, and often produces code that is not a good fit for modern multicore CPUs.[4] However, the principles underlying its simpler cases can be found in constructs such as the vmap in Google Jax.

References

  1. Blelloch, Guy (1995). NESL: A Nested Data-Parallel Language. 
  2. Data parallel Haskell: a status report
  3. Spoonhower, Daniel; Harper; Blelloch; Gibbons (2008). Space profiling for parallel functional programs. 
  4. Bergstrom, Lars; Fluet, Matthew; Rainey, Mike; Reppy, John, "Data-Only Flattening for Nested Data Parallelism", PPoPP, doi:10.1145/2442516.2442525