Create a Fast Orthogonal Random Transform

fort() creates an object (that inherits from class FastTransform) which represents a fast random \(\mathbb{R}^{dim\_in} \to \mathbb{R}^{dim\_out}\) linear transform. This transform will be orthonormal when \(dim\_in = dim\_out\) and they are a power of 2, and approximately orthogonal or semi-orthogonal (in the sense that either \(W^T W \approx I_{dim\_in}\) or \(W W^T \approx I_{dim\_out}\), if \(W\) represents the transform and \(I_n\) represents an N-dimensional identity matrix) otherwise.

Usage

fort(
  dim_in,
  dim_out = NULL,
  type = "default",
  cache_matrix = TRUE,
  seed = NULL
)

Arguments

dim_in: Either a scalar indicating the input dimensionality, or a vector of length 2 indicating the input and output dimensionality of the transform (if dim_out is not specified).
dim_out: A scalar indicating the output dimensionality of the transform (not required if the first parameter is a vector of length 2).
type: A string indicating the type of transform to use (optional); current valid options are: fft2 (i.e. default).
cache_matrix: Logical that controls whether matrices are cached when as.matrix() is called; should be set to FALSE if saving memory is important (optional, default = TRUE).
seed: If set, defines the seed used to generate the random transform (optional, default = NULL).

Value

An object of a class that inherits from class FastTransform and which represents a fast linear transform.

Details

The goal of fort() is to provide an easy and efficient way of calculating fast orthogonal random transforms (when dim_in is the same as dim_out) or semi-orthogonal transforms (when dim_in is different from dim_out) within R, by using fast structured transforms (like the Fast Fourier Transform or the Fast Walsh-Hadamard Transform) to avoid matrix multiplications, in the same spirit as the Fastfood (Rahimi et al. (2007)), ACDC (Moczulski et al. (2015)), HD (Yu et al. (2016)) and SD (Choromanski et al. (2017)) families of random structured transforms.

Internally, all fort transforms assume a blocksize which must be a power of 2 and no smaller than max(dim_in, dim_out). The resulting transform will be practically orthonormal when \(dim\_in = dim\_out\) and they match the blocksize of the transform, and practically semi-orthogonal when \(dim\_in \neq dim\_out\) and max(dim_in, dim_out) matches the blocksize. Otherwise, these properties will only approximately hold, since the output will result from a decimated transform (i.e., the rows and columns of the transform should be decorrelated, but not necessarily orthogonal).

`fort` transform types

The specific type of transform returned depends on the value passed in the type parameter, but all methods rely on alternating between applying permutations (complexity \(O(N)\)), diagonal scaling matrices (complexity \(O(N)\)) and structured fast linear transforms (such as the Fast Fourier Transform or the Fast Walsh-Hadamard Transform, which can be implemented with complexity \(O(N \mathrm{log} N)\)). Thus, it becomes possible to reduce the complexity of transforming an \(\mathbb{R}^N\) vector from \(O(N^{2})\) (using matrix multiplication) to \(O(N \mathrm{log} N)\).

Currently, the available options for the type parameter are:

default: this is the default option, if no type is specified; currently, it assumes the fft2 type, but this is subject to change (so avoid this option in non-interactive usage);
fft1: this type of fort transform uses the Fast Fourier Transform as base transform (which is used once); for more technical details, see FastTransformFFT1.
fft2: this type of fort transform uses the Fast Fourier Transform as base transform (which is used twice); for more technical details, see FastTransformFFT2.

Using `fort` transforms

In practice, to apply the fast transform to the columns of a matrix, you should use the %*% operator as if the output of fort() was a matrix (e.g., fort(4,6) %*% matrix(1:12,4,3) will output a 6 by 3 matrix that results from applying the transform on the left to the matrix on the right of the %*% operator).

Objects generated by fort() are also compatible with other methods applicable to matrix objects, such as dim(), ncol(), nrow(), solve(), t() and det(). Furthermore, these object can also be easily converted to matrices (using as.matrix()), if required.

References

Krzysztof M. Choromanski, Mark Rowland, and Adrian Weller. (2017). The unreasonable effectiveness of structured random orthogonal embeddings. Conference and Workshop on Neural Information Processing Systems. http://papers.neurips.cc/paper/6626-the-unreasonable-effectiveness-of-structured-random-orthogonal-embeddings

Felix Xinnan X. Yu, Ananda Theertha Suresh, Krzysztof M. Choromanski, Daniel N. Holtmann-Rice, and Sanjiv Kumar. (2016). Orthogonal random features. Conference and Workshop on Neural Information Processing Systems. http://papers.neurips.cc/paper/6246-orthogonal-random-features

Marcin Moczulski, Misha Denil, Jeremy Appleyard, and Nando de Freitas. (2015). ACDC: A structured efficient linear layer. https://arxiv.org/abs/1511.05946

Quoc Le, Tamás Sarlós and Alex Smola. (2013). Fastfood - approximating kernel expansions in loglinear time. International Conference on Machine Learning. https://proceedings.mlr.press/v28/le13-supp.pdf

Examples

fort(16) # a random orthogonal transform from R^16 to R^16
#> fort linear operation: R^16 -> [fft2] -> R^16
fort(5, 33) # a random transform from R^5 to R^33
#> fort linear operation: R^5 -> [R^64 -> fft2 -> R^64] -> R^33
fort(c(5, 33)) # same as previous line
#> fort linear operation: R^5 -> [R^64 -> fft2 -> R^64] -> R^33
# apply a random orthogonal transformation to the canonical R^4 basis
fort(4) %*% diag(4)
#>            [,1]        [,2]        [,3]        [,4]
#> [1,] 0.10670396  0.97471439  0.04746131  0.19050865
#> [2,] 0.91454452 -0.09463491 -0.38725963  0.06842912
#> [3,] 0.04746131  0.19050865 -0.10670396 -0.97471439
#> [4,] 0.38725963 -0.06842912  0.91454452 -0.09463491