Abstract: When a mathematical function (e.g. exponential, logarithm) is called in some programming language a function from the corresponding mathematical library (libm) is actually called. The main disadvantage of the existing libms is the manual implementation: they give a limited set of codes. This means that there is no choice between different result accuracies. As there is a link between accuracy and speed, users would prefer to choose between fast and accurate implementations. For some argument values the code needs to produce overflow/underflow exception or return NaN. Handling these special cases needs time, but if it is known beforehand that the argument range is small, this handling can be avoided. Such approach gives significant performance gains on processing of large amounts of data. The existing mathematical libraries do not give the needed flexibility and thus have to be rewritten. As it takes quite long to rewrite the code for an elementary function, it was proposed to write a code generator that produces implementations in C for some particular parametrization file. Metalibm is an academic prototype of such a code generator. It is written in Sollya language and although it is still under construction it already produces implementations for the most of functions from traditional glibc libm. The produced implementations are generated specially for the specified parameters. During the generation Metalibm decomposes the functions to be implemented, performs argument reduction for the basic set of functions (periodic functions, exponentials, logarithm), detects symmetry, splits the implementation domain to reduce the argument of an arbitrary function, and builds Remez approximation on each of the subdomains.
Abstract: Using mathematics within computer software almost always inculdes the neccessity to compute with real (or complex) numbers. However, implementations often just use the 64-bit double precision data type. This may lead to serious stability problems even for mathematically correct algorithms. There are many ways to reduce these software-induced stability problems, for example quadruple or multiple-precision data types, interval arithmetic, or even symbolic computation. Our approach (called Exact Real Arithmetic, ERA) is based on Recursive Analysis and its generalization to arbitary continuous objects called Type-2-Theory of Effectivity (TTE). It considers both inputs and output by approximation up to prescribable absolute error, thus guaranteeing closure under composition and polynomial-time computability of addition. More precise, ERA extends imperative programming languages with real numbers as entities in infinite precision, similarly to the Blum-Shub-Smale machine or real RAM but with a modified and computable semantic for tests. Additionally, higher data types like real sequences and functions allow for example to handle power series or to solve ODEs. In the talk we demonstrate aspects of practical programming in ERA as well as actual implementations using the iRRAM C++ library, where real numbers are realized internally using interval arithmetic with multiple-precision numbers enhanced with automated conversion of a-posterior to a-priori errors through iterated computations. (a) Examples of very simple ERA algorithms are range reductions well-known for speeeding up computations for real transcendental functions like exp, log or sine. Also iterations of the logistic are intractable in fixed-precision algorithms but easily handled in ERA. (b) Gaussian elimination in ERA raises a small challenge, as provably no function on regular 2x2 matrices can compute a pivot position; however a non- extensional (aka multivalued) "function" can, and yields an implementation inverting the highly unstable Hilbert matrix with ease and restricted only by main memory. (c) As comparison is uncomputable in TTE, bisection like for example in basic root finding has to be replaced by trisection. This yields for instance the roots of Wilkinson's polynomial. (d) Transcendental calculations like (fast converging) limits are an important feature of ERA. Applied to power series, we obtain efficient implementations of the trigonometric functions and ODE solvers based on matrix exponentials. (e) In spite of discontinuity, uniform computability can often be asserted by enriching the input information with discrete data: For power series for instance parameters describing a Cauchy-Hadamard bound, or in linear algebra the rank of the given matrix. (f) Questions of computational complexity arise as soon as the precision is taken into account: Romberg integration turns out to have exponential running time, while already the simple Newton-Cotes formulae can establish polynomial time complexity on some function classes. (g) An interesting trade-off arises in iteratively adapting the a-priori precision for a-posteri error bounds: On methods with superlinear convergence, the internal precision has to be increased during each round; while computing the coefficients in continued fractions immediately profits from reduced precision.
Abstract: To compute rigorously with reals and functions on reals, some kind of finite approximations of these objects are necessary. One of the most natural finite approximations of reals are rational intervals. Interval computation traces its roots back to Archimedes’ calculation of pi using pairs of bounds of narrowing difference but has truly become a field of study only in the past sixty years. The main driving force behind the development of the field has been a number of notable applications of interval techniques to rigorous differential equation solving, global optimization, and theorem proving. In the past couple decades these application areas have also been combined in rigorous algorithms for reachability analysis of hybrid systems. A famous example of such advances was Tucker’s proof that the Lorenz attractor is (indeed) a strange attractor. Interval computation uses and produces approximations of numerical quantities and functions. In their simplest form such approximations are just intervals, usually used to bound single real values. Cartesian products of such approximations are interval vectors, or boxes. A box can be used as an approximation of a single real vector, a geometric entity, or a part of a function graph. Our focus is the use of interval techniques to approximate functions on reals. However, box approximations are often too coarse to effectively approximate functions. Finer approximations have been developed, such as affine or polynomial enclosures. For example, Berz and Makino’s Taylor Models (TMs) form an arithmetic of polynomial enclosures of constant width. TMs and affine forms have been used with considerable success, for example, to bound rounding errors in floating point computations and to approximate the solutions to differential equations in beam physics simulations. We propose an algebra of function approximations as a proper generalisation of TMs. We then present several implementations of this algebra, each providing a general arithmetic of function approximations. We illustrate the flexibility of our approach in two areas of application and associated software: First, the arithmetics can be used to compute enclosures for solutions of ODE IVPs by means of a direct implementation of the interval Picard operator of Edalat and Pattinson. Such an elegant implementation of a rigorous ODE IVP solver seems impossible without the use of approximations that are more flexible than (the fixed width) TMs. Not only that, but a function approximation arithmetic also provides a conceptually simple way to extend Edalat and Pattinson’s work to the case of uncertain initial conditions. This method has been included in the Acumen tool and language for modeling and rigorous simulation of hybrid dynamical systems. (Available freely from www.acumen- language.org) Second, the arithmetics can be used to automatically prove theorems that take the form of inclusions of non-linear interval expressions. To prove such inclusions, both outer and inner approximations of interval functions are needed. Our arithmetics address this need by supporting the computation of tight inner approximations. This approach is embedded in the PolyPaver numerical theorem prover. (Available freely from github.com/michalkonecny/polypaver)
Abstract: Graphics Processing Unit (GPU) has been popularly used in heavy computations because the GPU drastically improves the performance and is relatively cheaper than cluster computing. Since many scientific and engineering problems are represented as in sparse matrix format, several researches have been done to improve the performance of the sparse matrix vector multiplication and iterative solvers on GPU, consequently. However, the performance of sparse matrix computation varies a lot based on the matrix ordering methods. Unfortunately, sparse matrix ordering on GPU has not been studied even the ordering methods have been studied by many researchers to improve the performance and reduce the fill-ins on cluster computing. In this paper, we will study the effect of the sparse matrix ordering on GPU computation and compare the performance of various sparse matrix ordering methods such as RCM, MMD, and ND on GPU computation. Especially, we will focus on the performance improvement of the sparse matrix vector multiplication using different matrix orderings and show the improvement by applying the orderings to the sparse iterative solver.
Abstract: swMATH is a novel information service for mathematical software. It offers open access to a comprehensive database with information on mathematical software and provides a systematic collection of references to software---relevant mathematical publications. The intention is to offer a list of all publications that refer to a software recorded in swMATH. For each software package contained in swMATH, references to peer---reviewed mathematical publications are provided which refer to this software; in particular, all such articles referenced in Zentralblatt MATH (zbMATH) are listed. There are two kinds of software---relevant articles: - Standard articles: articles explicitly describing a certain software package (methods, implementation details, user manuals etc.); - User articles: articles referring to a software as a tool for achieving new results in mathematical research or applications. By systematically linking mathematical software and publications, swMATH provides the potential software user with information on the actual usage of a software which is hard to find elsewhere. At the same time, software developers benefit from the systematic collection of literature references to their work by an independent institution and can gain interesting insights about the dissemination and the usefulness of their software. Moreover, if software is cited in scientific publications, this is also an important quality criterion, which is used by swMATH for software selection. The swMATH database has been jointly developed by the Mathematisches Forschungsinstitut Oberwolfach and FIZ Karlsruhe. It contains information about more the 6.200 software packages (metadata, keywords, similar software) and links to about 61.000 reviews of articles in zbMATH. It is maintained by FIZ Karlsruhe and through its connection to zbMATH continuously growing. swMATH sees itself as a service to the mathematical community. Please check it out at www.swmath.org.
Abstract: In the early days, the symbolic computation was investigated in the field of Artificial Intelligence and Physics. Macsyma and Reduce were the famous products of the first generation of computer algebra systems. For the last 20 years, a lot of mathematical research software systems developed by mathematicians. For example, KANT and PARI/GP for the number theory, GAP for the finite groups, Singular and Macaulay2 for the commutative algebra. Recently, we can find SAGE system, it is a very famous project leaded by the community of mathematicians. We can find the characteristic properties that many systems are published with open source software licenses. Now, we can use Maxima (the direct descendant of Macsyma) and Reduce, freely. In mathematical investigations, these professional systems become more important, but installing them are bothersome for many people. When we want to introduce these systems for our colleagues and students, we may have non- essential troubles. MathLibre is a project to archive open source mathematical software and documents and offer them with Live Linux. This is a direct descendant of KNOPPIX/Math Project. MathLibre provides a desktop for mathematics that can be set up easily and quickly. KNOPPIX/Math project began in February 2003. We changed the project name in 2012, the newest product is MathLibre 2014. It's supporting the virtual machine, USB bootable stick and hard disk installation. There is a great difference with MathLibre and KNOPPIX/Math. MathLibre is using collaborative revision control system GitHub and Live Systems Project for development, so anyone can make their personal version of MathLibre Live system. We make up the easy mechanism to configure the language environment, so we can freely distribute this personalized and localized system for your colleagues and students. In the last few years, we were making the localized version of MathLibre, and introduced these systems in some congresses held in China, Hungary, Japan, Korea and Taiwan. We have experiences of distributing over 10000 pieces of DVDs in this few years. We will describe the practices of MathLibre and how to modify this system.
Abstract: Normaliz is a software for computations with rational cones and affine monoids. It pursues two main computational goals: finding the Hilbert basis, a minimal generating system of the monoid of lattice points of a cone; and counting elements degree-wise in a generating function, the Hilbert series. In this talk we present recent developments in the functional range and the underlying algorithms. The functionality has been extended in several ways. Normaliz can now handle semi-open cones and also has direct support for (unbounded) polyhedra. There are two new methods of computing the lattice points of a rational polytope. The algorithms in Normaliz base on a triangulation of the cone (except the dual algorithm). The newly developed pyramid decomposition method to compute a triangulation gives formidable improvements for larger examples, both in computation time and memory usage. A triangulation is a non-disjoint decomposition of the cone. Especially for Hilbert series computations an exact (disjoint) decomposition is needed. We have implemented a new way to gain it from the triangulation. For Hilbert basis computations of combinatorial examples we had introduced a partial triangulation in version 2.5. It has now been tuned to check the normality of even larger monoids. Together with the parallelization of the algorithms, these improvements enable us to compute significantly larger examples.
Abstract: libnormaliz is a C++ library for computations with rational cones and affine monoids; Normaliz is a simple, stand-alone system built on top of libnormaliz. CoCoALib is a C++ library designed to offer an easy-to-use, general environment for efficient computations in Commutative Algebra; CoCoA-5 is an interactive system built on top of CoCoALib. For mutual benefit we have developed a simple and fast interface between the two libraries; thus, for instance, in the presence of libnormaliz, CoCoALib acquires new data-structures and operations giving almost direct access to libnormaliz capabilities. These extensions also appear in CoCoA-5, so libnormaliz operations become accessible when working on rings and monoid algebras in CoCoA-5; in this way CoCoA-5 becomes a convenient, sophisticated front-end for libnormaliz. Firstly we present how this integration was designed and how the same simple design techniques can be used for integrating other external libraries into CoCoALib (and also subsequently into CoCoA-5); indeed we have applied the same techniques for integrating the software libraries Frobby and GSL (in part). The interfacing is direct in C++ rather than via some form of remote invocation. The direct approach is simpler and faster (e.g. much lower communication costs), and also facilitates a "symbiotic integration": for instance the newly developed extension to Normaliz, called nmzIntegrate, uses features from CoCoALib. An object oriented design allows for efficient management of transported values and the corresponding representation transformations. Secondly we describe in detail the Normaliz functions we have made available in CoCoALib (and also in CoCoA-5). We offer specific ring theoretic functions applied to monomial subalgebras, monomial ideals and binomial ideals in polynomial rings. Additionally, we provide direct access to libnormaliz via a general function "NmzComputation" which faithfully reflects the internal structure of the libnormaliz design.
Abstract: Sparse Matrix Vector multiplication (SpMV) is one of the most important operation for exact sparse linear algebra, in particular for iterative methods such as Wiedemann or Lanczos. A lot of research has been done by the numerical community to provide efficient sparse matrix formats for floating point calculation. However, when computing over finite fields, one need to deal with multi-precision values and with more complex operations, e.g. modular reduction. In order to provide highly efficient SpMV kernel over finite field, we propose a code generation tool that uses heuristics to automatically choose the underlying matrix representation and the corresponding arithmetic. Our generator uses a combination of different formats among the well known CSR, SELL, ELL to better suit the matrix properties and the architecture. For instance, our generator is able to adapt data types used in the matrix format to reduce its memory footprint. As an example, it can replace row pointers in CSR format, usually 4 or 8 bytes, by storing only the number of non zero elements per row when the latter is small, e.g. only 1 byte for less than 256 elements. It can also isolate ones from the matrix entries to further compress the data structure and avoid superfluous multiplications. This is of great interest from matrices arising in discrete logarithm computations which have a majority of ones. When dealing with small finite fields, e.g. less than a word-size, one can delay modular reductions until an overflow occurs. Therefore, computation is relaxed over the integers and reduction are done only when necessary. Our generator find the best strategy to delay as much as possible this modular reduction. When dealing with multi-precision entries, our generator uses conversions to a residue number system (RNS) in order to benefit from SIMD vectorization. This approach avoids memory jump while accessing elements, thus reducing the data volume transferred.
Abstract: We develop in this paper design techniques used in the C++ exact linear algebra library LinBox. They are intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure of the containers, based on our founding scope allocation model (cf. ). Namely, vectors and matrix containers are all templated by a field and a storage type. Matrix interfaces all agree with the same minimal blackbox interface. This allows e.g. for a unification of our dense and sparse matrices, as well as a clearer model for matrices and submatrices. We explain the design choices and their impact on coding. We will describe serveral of the new containers, especially our sparse and dense matrices storages as well as their apply (blackbox) method and compare to previous implementations. Then we present a variation of the strategy design pattern that is com- prised of a controller–plugin system: the controller (solution) chooses among plug-ins (algorithms) and the plug-ins always call back the solu- tion so a new choice can be made by the controller. We give examples using the solution mul, and generalise this design pattern to the library. We also show performance comparisons with former LinBox versions. Finally we present a benchmark architecture that serves two purposes. The first one consists in providing the user with an easy way to produces graphs using C++. The second goal is to create a framework for auto- matically tuning the library (determine thresholds, choose algorithms) and provide a regression testing scheme.
Abstract: The CUMODP Library provides arithmetic operations for dense matrices and dense polynomials primarily with modular integer coefficients, targeting many-core GPUs. Some operations are available for integer or floating point coefficients as well. Typical CUMODP operations are matrix determinant computation, polynomial multiplication (both plain and FFT-based), univariate polynomial division, the Euclidean algorithm for univariate polynomial GCD, subproduct tree techniques for multi-point evaluation and interpolation, subresultant chain computation for multivariate polynomials, bivariate system solving. The CUMODP library is written in CUDA and its source code is publicly available at www.cumodp.org In this talk, after an overview of the functonalities of the CUMODP library, we shall present its main implementation techniques and discuss its performance. First of all, we have developed a model of multithreaded computation, combining fork-join and single-instruction-multiple-data parallelisms, with an emphasis on estimating parallelism overheads of programs written for modern many-core architectures. For each core CUMODP routine, this model is used to minimize parallelism overheads by determining an appropriate value range for a given program parameter, e.g. number of threads per block. Experimentation confirms the effectiveness of this model. Secondly, we demonstrate the importance of adaptive algorithms in the context of many-core GPUs. That is, algorithms that adapt their behavior according to the available computing resources. In particular, we combine parallel plain arithmetic and parallel fast arithmetic (based on FFT). We report on the first GPU implementation of subproduct tree techniques for multi-point evaluation and interpolation of univariate polynomials. Hence we compare our code against probably the best serial C code (namely the FLINT library) for the same operations. For sufficiently large input data and on NVIDIA Tesla C2050, our code outperforms its serial counterpart by a factor ranging between 20 to 30.
Abstract: The Basic Polynomial Algebra Subprograms (BPAS) provides arithmetic operations (multiplication, division, root isolation, etc.) for univariate and multivariate polynomials over prime fields or with integer coefficients. The current distribution focuses on dense polynomials and the sparse case is work in progress. The code is mainly written in CilkPlus targeting multicores. A strong emphasis is put on adaptive algorithms as the library aims at supporting a wide variety of situations in terms of problem sizes and available computing resources. One of the purpose of the BPAS project is to take advantage of hardware accelerators in the development of polynomial systems solvers. The BPAS library is publicly available in source at www.bpaslib.org Inspired by the Basic Linear Algebra Subprograms (BLAS), BPAS functionalities are organized into three levels. At Level 1, one finds basic arithmetic operations that are specific to a polynomial representation or a coefficient ring, e.g. multi-dimensional FFTs/TFTs, univariate real root isolation. At Level 2, arithmetic operations are implemented for all types of coefficients rings supported by BPAS (prime fields, ring of integers, field of rational numbers). Level 3 gathers advanced arithmetic operations taking as input a zero-dimensional regular chains, e.g. normal form of a polynomial, multivariate real root isolation. In this talk, after an overview of the functionalities BPAS library, we discuss its implementation techniques. Level 1 functions are highly optimized in terms of locality and parallelism. In particular, the underlying algorithms are nearly optimal in terms of cache complexity. At Level 2, the user can choose between algorithms minimizing work (at the expense of decreasing parallelism) and algorithms maximizing parallelism (at the expense of increasing work). This leads, at Level 3, to adaptive algorithms which select appropriate Level 2 functions depending on available resources (number of cores, input data size).