Metaprogramming for math library Jiří Chmelík, Marek Trtík PA199 2  Metaprogram is a code whose compilation generates a new code whose functionality we really want.  First metaprogram:  Erwin Unruh, 1994.  During compilation of his program the compiler computed prime numbers.  Our first metaprogram will be much simpler… Metaprogramming 3  Let’s make a C++ compiler to compute 3 𝑁 during compilation.  Our metaprogram: template struct Pow3 { enum { result = 3 * Pow3::result }; // Recursion rule }; template<> struct Pow3<0> { enum { result = 1 }; // Base case }; std::cout << Pow3<4>::result;  Resulting program: std::cout << 3*3*3*3*1; Our first metaprogram 4  Let’s consider this implementation of the dot product: float dot_product(int dim, float* u, float* v) { float result = 0.0f; for (int i=0; i struct DotProduct { // Recursion rule static inline float result(float* u, float* v) { return (*u) * (*v) + DotProduct::result(u+1, v+1); } }; template<> struct DotProduct<1> { // Base case static inline float result(float* u, float* v) { return (*u) * (*v); } }; Unrolling loops: Solution 6 template inline float dot_product(float* u, float* v) { return DotProduct::result(u, v); }  The compiler then converts the call dot_product<3>(u,v) into code:  DotProduct<3>::result(u,v);  (*u) * (*v) + DotProduct<2>::result(u+1,v+1);  (*u) * (*v) + ((*(u+1)) * (*(v+1)) + DotProduct<1>::result(u+2,v+2));  (*u) * (*v) + ((*(u+1)) * (*(v+1)) + (*(u+2)) * (*(v+2))); Unrolling loops: Solution 7  When designing a math library, it is desirable to provide this syntax: Vector u(1000), v(1000); // Define vectors … // Initialise them u = 1.2f * u + u * v; // Perform operations.  In C++ we can achieve this syntax via operator overloading.  Question: Could there be a performance issue with such approach?  Let’s start with a naïve solution… Expression templates: Motivation 8 class SVector { // “Simple” Vector of any dimension. int size_; float* data_; void copy(SVector const& o) { for (int i=0; i Mul Add, Mul > 14 class Scalar { float const& s_; public: Scalar(float const& s) : s_(s) {} int size() const { return 0; } float operator[](int const) const { return s_; } }; Expression templates: Scalar 15  Inside operator classes Add and Mul we must distinguish how we reference operands:  Scalar operand – by value.  Other operands – by reference. template struct ArgType { typedef T const& result; }; template<> struct ArgType { typedef Scalar result; }; Expression templates: ArgType 16 template class Mul { typename ArgType::result op1_; typename ArgType::result op2_; public: Mul(OP1 const& op1, OP2 const& op2) : op1_(op1), op2_(op2) {} int size() const { return op1.size()!=0? op1.size() : op2.size(); } float operator[](int const i) const { return op1_[i] * op2_[i]; } }; Expression templates: Mul 17 template class Add { typename ArgType::result op1_; typename ArgType::result op2_; public: Add(OP1 const& op1, OP2 const& op2) : op1_(op1), op2_(op2) {} int size() const { return op1.size()!=0? op1.size() : op2.size(); } float operator[](int const i) const { return op1_[i] + op2_[i]; } }; Expression templates: Add 18  How do we define operators? How about this: template Mul operator*(T1 const& u, T2 const& v) { … } // (1)  Not good, parameters are too general, e.g.: struct A {} a; struct B : public A {} b; template A operator*(T const& s, A const& a) { … } // (2) 3 * a; // Always calls (2) 3 * b; // Calls (2) only when (1) is not defined. Expression templates: Expression class 19  We also do not want define all possible versions (too much work): Mul< SVector, SVector > operator*(SVector const& u, SVector const& v); template Mul > operator*(SVector const& u, Mul const& v); template Mul, SVector> operator*(Mul const& u, SVector const& v); template Mul< SVector,Add > operator*(SVector const& u, Add const& v); …  Solution: We introduce a class “Expr” for wrapping our expressions: template class Expr;  We can safely declare the operator as: template Expr > operator*( Expr const& u, Expr const& v); Expression templates: Expression class 20 template class Expr { // Expression Rep rep_; public: explicit Expr(int size) : rep_(size) {} Expr(Rep const& rep) : rep_(rep) {} Rep const& rep() const { return rep_; } // Unwrap the instance. int size() const { return rep_.size(); } float operator[](int const i) const { return rep_[i]; } // Defined later: template Expr& operator=(Expr const& o); }; Expression templates: Expression 21 template Expr > operator*(Expr const& u, Expr const& v) { return Expr >(Mul(u.rep(), v.rep())); } template Expr > operator*(float const& s, Expr const& u) { return Expr >(Mul(Scalar(s), u.rep())); } template Expr > operator+(Expr const& u, Expr const& v) { return Expr >(Add(u.rep(), v.rep())); } Expression templates: Operators Wrap the result Actual operation Unwrap operands 22 template template Expr& Expr::operator=(Expr const& o) { for (int i=0; i Add[i] (Expr > forwards operator[] to ‘rep’) ==> Add::op1[i] + Add::op2[i] ==> Mul::op1[i] * Mul::op2[i] + Mul::op1[i] * Mul::op2[i] ==> Scalar[i] * SVector[i] + SVector[i] * SVector[i] ==> 1.2f * u[i] + u[i] * v[i] Expression templates: Expr::operator= Add, Mul > Compiler converts ‘o[i]’ to ‘1.2f*u[i] + u[i]*v[i]’ => we get the efficient version! 23  We must declare our vector variables as: Expr u(1000), v(1000); // Not nice!  Therefore, we usually define: typedef Expr Vector;  Limitation: Does not work for operations where RAM temporaries are really needed, e.g., ‘x = A*x’, where x is a vector and A is a matrix.  But ‘y = A*x’ would work just fine (assuming x,y are different vectors).  Usage rule of thumb:  For small fixed-size vectors use metaprogramming, e.g., unrolling loops.  For large dynamic-size vectors use expression templates. Expression templates: Notes & Limitations 24 [1] D.Vandevoorde, N.M.Josuttis; C++ Temlates, The Complete Guide; Addison-Wesley,2006. References