F# and Data Mining: March 2010

Sunday, March 28, 2010

Matrix and linear algebra in F#, Part II: doing linear algebra via math providers

Long story short

Suppose we are using the latest release, 1.9.9.9, which does not support linear algebra anymore in its PowerPack. But linear algebra is supported before in 1.9.7.8. Under its bin\ folder, there is a DLL named FSharp.PowerPack.Math.Providers.dll. “Math providers” means F# calls other math libraries, to be specific, an incomplete managed implementation in F#, Netlib’s Lapack+Blas and Intel MKL. The F# one is immature, while the last two are really really good! Plus Netlib’s implementation is free.

However, we cannot just use that provider dll because it has conflicts with latest PowerPack. We need to build a new one! Here are the steps:

1. Compile Netlib’s Lapack to get lapack.dll and blas.dll. Refer to this post of mine. You can also search online to download these two dlls, e.g. http://www.stanford.edu/~vkl/code/libs.html (I haven't tested this).

2. Get the source code of 1.9.7.8. Go to folder FSharp-1.9.7.8\source\fsppack\FSharp.PowerPack. Remove the old reference to FSharp.PowerPack and add two 1.9.9.9 DLLs to the reference:

FSharp.PowerPack.dll and FSharp.PowerPack.Compatibility (Because in 1.9.9.9, Compatibility module is separated out.)

Compile and you get only a few errors regarding “permutation”:

you could just define it explicitly: type permutation = int –> int

Done! Now we move on to

Use Lapack library

First make sure your project

1. references to

FSharp.PowerPack.Compatibility, FSharp.PowerPack and FSharp.PowerPack.Math.Providers.dll.

2. has lapack.dll and blas.dll in the binary output folder. E.g. Debug\ or Release\. (Technically we only need to put them under a folder .Net platform is searchable, similar to Java’s CLASSPATH.)

Finally, let’s write a simple program:

open Microsoft.FSharp.Math
let r = new System.Random()
let A = Matrix.init 4 4 (fun i j -> r.NextDouble() * 1000.0)
let Ainv = Experimental.LinearAlgebra.Inverse A
let eigen = Experimental.LinearAlgebra.EigenValues A

Although this program compiles, it gives a Not-Implemented exception on EigenValues function. (It can compute Inverse of A correctly.) This is because functions in LinearAlgebra module use managed implementation (see linear_algebra_managed.fs in lapack folder) on default. However, the managed implementation is incomplete, and not thoroughly tested.

We need to first open the lapack service:

let isSucc = Experimental.LinearAlgebra.Lapack.Start()

Put this before any computation. (Because there’s no documentation at all, I’ve read all the code in lapack folder to find this line…)

The returned bool value isSucc indicates whether the Lapack service providers successfully loads dlls for Netlib or not.

If you have Intel MKL

Put mkl_def.dll and mkl_lapack.dll under a .Net searchable folder. Notice that the wrapper is for MKL 9.1 not for 10 series.

Btw, the Lapack service provider will choose MKL if both MKL and Netlib are presented.

The Long story

.Net do have some good math libraries. They are efficient (native performance), easy to use. However, they are not free. For the free ones, there’s a Math.Net open source project, which is still under its alpha-testing phase. These open source projects seem to tend to implement all algorithms from starch in C#, rather than link to some mature ones, like Netlib’s lapack. This certainly has some advantages, like more safe code, better memory management in a .Net sense and exception handling. However, this strategy would have a long development time.

I also suspect their efficiency. I have a L-BFGS based logistic regression solver written C++, it runs 3~4 times slower if compiled into .Net using C++/CLI. Drawing a conclusion from this single case is too assertive. (I used STL and STL is slow in C++/CLI, in C++/CLI we’d better use CLI STL.) However, I still trust native libraries to perform basic linear algebra for the performance's seek.

I started to use F# from its 1.9.7.8 (The October CTP, 2009). In that release, the math provider library is already spitted out from the main PowerPack as a standalone DLL. But there’s no documentation on it, except the source code. Other math types, e.g. matrix, vector, are documented. So I didn’t know there exists such a library at first. By occasion, I found the lapack folder in the F# source. Today, I finally have some time to dig into it, get it compiled and know how to start the linear algebra service provider. Fortunately, once started, the service provider is able to automatically find the DLLs of Netlib or MKL, which is really nice.

The code is still compliable using 1.9.9.9, but I am not sure it will in the future as there are a lot of warnings mentioning deprecated language usages, e.g. use OCaml-style ‘^’to connect two strings.

As this code is under MS open source license, there would be license problems if we maintain and distribute it. So a possible solution is to rewrite the service provider and make it open source (e.g. MIT or BSD license), or, persuade MSR Cambridge or MS VS Team to continue the development.

Compiling Lapack for .Net usage

The aim: we need lapack.dll and blas.dll

A stable and efficient implementation of the common linear algebra routines is essential to engineering. Lapack is the one. Lapack are blas are kind of standard, there are different vendors implementing them, e.g. Intel and AMD both have tuned versions targeted for their hardware.

There are two major implementations available on Windows-Intel platform: Netlib and Intel Math Kernel Library (MKL). Although MKL is very good (Matlab uses it for its basic linear algebra computation), it costs money.Today I’d like to write a post on how to build Netlib’s implementation for .Net usage.

There are two tutorials online for the same purpose:

http://www.codingday.com/compile-lapack-and-blas-as-dll-on-windows

http://icl.cs.utk.edu/lapack-for-windows/lapack/index.html

The first one is quite good. we may encounter some minor problems if we directly follow that guide. I’d like to give a new summary here:

1. Download Lapack+Blas(a single package) from Netlib. Use lapack-3.1.1.tgz.

2. Install MinGw, we only need to install the basic tools + g77 compiler. Because this free compiler only supports Fortran 77 and the latest Lapack is using Fortran 90 now, so in the first step, we use an old version (2007) of Lapack.

Set the PATH for MinGw. And make sure to successfully compile a hello world Fortran program.

3. Copy dlamch.f and slamch.f from INSTALL directory SRC directory

4. Compile BLAS:

g77 --shared -o blas.dll BLAS\SRC\*.f -O3

5. Compile Lapack:

cd SRC
g77 -c *.f -O3
cd ..
g77 -shared -o lapack.dll SRC/*.o blas.dll -O3

If you see some “collect2.exe” error, then change TMP environment variable(originally %USERPROFILE%\AppData\Local\Temp) to a short one, e.g. “c:\temp”

We should have lapack.dll and blas.dll in the Lapack folder now.

Once we have the two native DLLs, we can use P/Invoke in .Net to call linear algebra operations in Lapack. If you use F#, then we don’t even need to know P/Invoke because the F# team already did that for us.

A Note: P/Invoke in Mono platform is also quite convenient, compile lapack and blas into .so files. http://www.mono-project.com/Interop_with_Native_Libraries.

Matrix and linear algebra in F#, Part I: the F# Matrix type

Every language has libraries, besides the big .Net libraries, F# has two own: the Core, which is shipped with Visual Studio 2010, and the PowerPack, which is an external library developed by MSR Cambridge and Visual Studio Team. Notice that the code quality in PowerPack is actually quite high, it is put outside the Core library because it is evolving fast. Once stable, they may be put into the Core.

Our concern is matrix and linear algebra operations. There is a matrix class in F# PowerPack. However, Microsoft didn’t officially put the documentation online. For F# 1.9.6, there is a outdated page on MSR’s website. But it doe not matter we use the old documentation, since the interface for Math haven’t change much since then.

The namespace for Math is:

Namespace Microsoft.FSharp.Math

In this namespace, we have:

1. complex numbers

2. Big rational numbers

3. vector and row-vector

4. matrix

In this post, I focus on matrix.

The real matrix:

First, Names! There is a type called matrix, which is a matrix holding double or 32-bit long float values.

There is a module called Matrix, inside which there are lots of functions to operate on an F# matrix.

There is also a function/val called matrix, which is used like a constructor to construct a new matrix from lists or arrays.

We can easily create two 3-by-3 matrices using the matrix function:

let A = matrix [ [ 1.0; 7.0; 2.0 ];
               [ 1.0; 3.0; 1.0 ];
               [ 2.0; 9.0; 1.0 ]; ]

let B = matrix [ [ 10.0; 70.0; 20.0 ];
               [ 10.0; 30.0; 10.0 ];
               [ 20.0; 90.0; 10.0 ]; ]

All the member functions and operators associated with matrix type are documented here. Here are some examples:

A+B
A-B
A*B // matrix product
A.*B  // element-wise product
A * 2.0 // scalar product
2.0 * A // this is also ok
-A // negation of a matrix

let b = vector [5.;8.;9.]; // defines a vector
A*b // matrix-vector product

You can get the properties using member functions:

let dim = A.Dimensions
// val dim : int * int = (3, 3), the dimension is a tuple
let A' = A.Transpose // you can use ' in a variable name!
let nrow = A.NumRows
let ncol = A.NumCols
let Anew = A.Copy() // get a new matrix
let Aarr = A.ToArray2D() // convert to a Array2D type
let Avec = A.ToVector() // take the first column of A
let Arvec = A.ToRowVector() // take the fisrt row of A
// ToVector and ToRowVector is usually used
// when you know your matrix is actually a vector

Accessing a matrix

We can have Matlab like access to an F# matrix. One different thing is that the index starts from 0, not 1. Mathematicians like the index to start with 1, e.g. in R and Matlab. While programs like 0-based index, e.g. Numpy for Python.

// notice that the index starts at 0
A.[2,2] //The operator [,] allows access a specific
      //element in the matrix, shorthand for A.Item
A.[2,2] <- 100.0 // change a value
A.[1..2,1..2] // get a sub matrix, shorthand for A.GetSlice
A.[1..2,1..2] <- matrix [[2.;3.]; [8.;9.;]] // set a sub matrix, shorthand for A.SetSlice

We also have 4 member functions: Column, Columns, Row and Rows:

A.Column 2 // Vector<float> = vector [|2.0; 1.0; 1.0|]
A.Row 2 // RowVector<float> = rowvec [|2.0; 9.0; 1.0|]
A.Columns (1,2) // starts at column 1, take 2 columns
//val it : Matrix<float> = matrix [[7.0; 2.0]
//                                 [3.0; 1.0]
//                                 [9.0; 1.0]]
A.Rows (1,2) // starts at row 1, take 2 columns

The Matrix module

Similar to that F# list type has a List module containing handy functions like map, fold and etc, the real matrix type matrix also has a module.

let Asum = Matrix.sum A // sum of all elements in A
let Aprod = Matrix.prod A // product of all elements in A
let C = Matrix.create 10 10 1.0 // create a matrix with 1s
let table = Matrix.init 9 9 (fun i j -> (float i + 1.) * (float j + 1.))
// create a matrix with a function
let I10 = Matrix.identity 10 // 10 1s one diagnal
let Atrace = Matrix.trace A // trace sum
let Asqr = Matrix.map (fun x -> x*x) A // A^2

And there are some repetitions on the member function/operators of matrix type. E.g. Matrix.add, Matrix.set, Matrix.get, Matrix.toVector, Matrix.toRowVector, Matrix.transpose, etc.

Sparse matrix

let D = Matrix.initSparse 3 3 [ (0,0,1.0); (1,1,2.0); (2,2,3.0); ]

// init a sparse 3-by-3 matrix
//val it : matrix = matrix [[1.0; 0.0; 0.0]
//                          [0.0; 2.0; 0.0]
//                          [0.0; 0.0; 3.0]]
D.IsSparse // sparse test

let E = Matrix.initSparse 100000 100000 [ (0,0,1.0); (1,1,2.0); (2,2,3.0); ]

let Esum = Matrix.sum E

However, map, fold, exists, .. are not supported on sparse matrix.

Int Matrix, BigNum Matrix, and others

To know about Generic matrix, you may want to read another post of mine:

http://fdatamining.blogspot.com/2010/03/f-inumerics-interface-and-matrix-class.html

which also discusses some implementation details of the Matrix class, and how to define your own matrix, e.g. a Pixel matrix.

Saturday, March 27, 2010

F# INumerics Interface, and Matrix class

The blog post is inspired by a question on StackOverflow. It asks whether the Matrix class in F# PowerPack supports a user defined type.

First the answer is: YES.

But supporting that is non-trivial. I found very few material online on this subject, so I decided to to write a blog post on it. Since matrix is the very core concept in data mining and machine learning, our digging into the F# Matrix and relevant interfaces would also be valuable.

In this post, I will lead to read a part of the F# PowerPack source code, which is in source\fsppack\FSharp.PowerPack\math folder.

Although the latest version is 1.9.9.9, I would suggest to use the source code of 1.9.7.8, which contains a lapack subfolder in the math folder. What’s this? The researchers in MSR did a great wrapper to enable F# to call common linear algebra libraries, e.g. the free lapack and the commercial IMK (Intel Math Kernel Library). However, this part of code is removed from the 1.9.9.9 release. These code will be merged into Math.Net project, which is still under development. This is just a kind notice, we don’t need to know lapack wrapper today.

Let’s get back to that question. In F#, we can use matrices similar to Matlab:

let B = matrix [ [ 1.0; 7.0 ];
               [ 1.0; 3.0 ] ]

where matrix is a actually a function, its declaration/signature is

val matrix : seq<#seq<float>> –> matrix

and the implementation is:

let matrix ll = Matrix.ofSeq ll

As a list implements the IENumerable interface, so we can use a list to initialize a matrix, so is array.

Let see the type of B:

val B : matrix = matrix [[1.0; 7.0]
                       [1.0; 3.0]]

So we can see that matrix is also a type name, it is ok to have the same name for a type and a function. In the source file Matrix.fs, we can find the definition of type matrix:

type matrix = Matrix<float>

It is defined as Matrix<float>, where Matrix<_> is called a meta type or polymorphic type. Meta types are used to create other type by putting a element type in between <>. You can think it as an equivalent to C++ templates, or Java/.Net Generics. (There are differences, however that would be a long story.)

Also notice that Matrix in Matrix<_> is a meta type, while Matrix in Matrix.ofSeq is a module name. So they also can have the same name as they are different things.

One thing we can come to mind is that, we can define our integer matrix type as

type intmatrix = Matrix<int>

and we also need a constructor for it

let intmatrix ll = Matrix.Generic.ofSeq  ll

Matrix.Generic module contains functions for generic type matrices, while the functions in Matrix module are only for float matrices.

We can happily do some operations on integer matrices:

let C = intmatrix [ [1;2]; [3;4] ];
let D = intmatrix [ [1;1]; [1;1] ];
let E = Matrix.Generic.inplaceAdd C D
C+D
C * 10

So far so good. So we can put float, int, etc into Matrix<_>, can we put:

type Pixel = 
  | P of int*int*int

into it? Kind of yes:

let C = pmatrix [ [P(2,2,2);P(2,2,2)]; [P(2,2,2);P(2,2,2)] ];

But when adding two such matrices, of course error occurs!

We must lack something, e.g. at least we need to define the behavior of the add operator for Pixel type.

The idea is that every type that could put into Matrix<_> is associated with an instance of INumeric interface. Notice that I use the word ‘associate’. It means that for each number type T, we associate an object with it, which tells the matrix class how to add, subtract, multiply, etc. on T. E.g. we have Int32Numerics defined in INumeric.fs, which is associated with int type:

let Int32Numerics =
{ new IIntegral<int32> with
     member __.Zero = 0
     member __.One = 1
     member __.Add(a,b) = a + b
     member __.Subtract(a,b) = a - b
     member __.Multiply(a,b) = a * b
     member __.Equals(a,b) = (a = b)
     member __.Compare(a,b) = compare a b
     member __.Negate(a) = - a
     member __.Abs(a) = a
     member __.ToBigInt(a) = new BigInteger(a)
     member __.OfBigInt(a) = int32 a
     member __.Sign(a) = Math.Sign(a)
     member __.Modulus(a,b) = a % b
     member __.Divide(a,b) = a / b
     member __.DivRem(a,b) = (a / b, a % b)
     member __.ToString((x:int32),fmt,fmtprovider) =
            x.ToString(fmt,fmtprovider)
     member __.Parse(s,numstyle,fmtprovider) =
            System.Int32.Parse(s,numstyle,fmtprovider)
  interface INormFloat<int32> with
     member __.Norm(x) = float (abs x)
}

where IIntegral<> is an interface inherited from INumeric<>.

Let’s move on to user defined types. We know there is a Big Rational type defined in F# PowerPack. We can easily use it as:

let rmatrix ll = Matrix.Generic.ofSeq  ll
let C = rmatrix [ [1N;2N]; [3N;4N] ];
let D = rmatrix [ [1N;1N]; [1N;1N] ];
C+D

The rational number is defined in q.fs. However, the code to enable it to be used in Matrix<_> are in other places. First, In INumerics.fs, we find:

let BigNumNumerics =
{ new IFractional<bignum> with
     member __.Zero = BigNum.Zero
     member __.One = BigNum.One
     member __.Add(a,b)      = a + b
     member __.Subtract(a,b) = a - b
     member __.Multiply(a,b) = a * b
     member __.Equals(a,b) = (a = b)

How will Matrix know a number type is associated with its numeric? The code is in associations.fs:

let ht =
  let ht = new System.Collections.Generic.Dictionary<Type,obj>()
  let optab =
      [ typeof<float>,   (Some(FloatNumerics    :> INumeric<float>) :> obj);
        typeof<int32>,   (Some(Int32Numerics    :> INumeric<int32>) :> obj);
        typeof<int64>,   (Some(Int64Numerics    :> INumeric<int64>) :> obj);
        typeof<BigInteger>,  (Some(BigIntNumerics   :> INumeric<BigInteger>) :> obj);
        typeof<float32>, (Some(Float32Numerics  :> INumeric<float32>) :> obj);
        typeof<Microsoft.FSharp.Math.Complex>, (Some(ComplexNumerics :> INumeric<Microsoft.FSharp.Math.Complex>) :> obj);
        typeof<bignum>,  (Some(BigNumNumerics   :> INumeric<bignum>) :> obj); ]
   
  List.iter (fun (ty,ops) -> ht.Add(ty,ops)) optab;
  ht

The types above are automatically supported. If we define a new type, we need to add the association of the type and its numeric into the variable ht.

By now, we know all the secrets. We can code our Pixel type now:

type Pixel =

    | P of int*int*int
  static member makePixel (a,b,c) = P (a,b,c)

  override p.ToString() =
      let (P(x,y,z)) = p
      "("+x.ToString()+", "+y.ToString()+", "+z.ToString()+")"
    
  static member (+) (P(a,b,c), P(x,y,z)) = P (a+x, b+y, c+z)

  static member (-) (P(a,b,c), P(x,y,z)) = P (a-x, b-y, c-z)
  static member (*) (P(a,b,c), P(x,y,z)) = P (0, 0, 0)
  static member (/) (P(a,b,c), P(x,y,z)) = P (0, 0, 0)
  static member (~+) (P(a,b,c), P(x,y,z)) = P (0, 0, 0)
  static member (~-) (P(a,b,c), P(x,y,z)) = P (0, 0, 0)

Its numerics:

let PixelNumerics =
  { new INumeric<Pixel> with
      member op.Zero = P(0,0,0)
      member op.One = P(1,1,1)
      member op.Add(a,b) = a + b
      member op.Subtract(a,b) = a - b
      member op.Multiply(a,b) = a * b
      member ops.Negate(a) = a
      member ops.Abs(a) = a
      member ops.Equals(a, b) = false
      member ops.Compare(a, b) = 0
      member ops.Sign(a) = failwith "not defined."
      member ops.ToString(x,fmt,fmtprovider) =  failwith "not implemented"
      member ops.Parse(s,numstyle,fmtprovider) = failwith "not implemented"
  }

add the association:

GlobalAssociations.RegisterNumericAssociation(PixelNumerics)

Some Sort Algorithms in F#

// insertSort

let rec insert x lst =
   match lst with
   | [] -> [x]
   | y::rest -> if x < y then x::y::rest else y::(insert x rest)

let rec insertSort = function
   | [] -> []
   | x::rest -> insert x (insertSort rest)
  
  
// mergeSort
let rec split = function
   | [] -> ([], [])
   | a::[] -> ([a], [])
   | a::b::rest -> let p, q = split rest in (a::p, b::q)

let rec merge p q =
   match p, q with
   | [], [] -> []
   | a, [] -> a
   | [], a -> a
   | x::xs, y::ys when x<=y -> x::merge xs q
   | x::xs, y::ys -> y::merge p ys
  
let rec mergeSort = function
   | [] -> []
   | a::[] -> [a]
   | lst ->
       let p, q = split lst
       let ps = mergeSort p
       let qs = mergeSort q
       merge ps qs
      
// quickSort
let rec quickSort = function
   | [] -> []
   | [a] -> [a]
   | x::xs ->
       quickSort [for y in xs do if y<=x then yield y]
       @ [x]
       @ quickSort [for y in xs do if y>x then yield y]

Wednesday, March 17, 2010

test from live writer

seems to be good!