Saturday, May 16, 2015

Is Rust good for data mining?

 

Rust 1.0 is just released. This is great achievement for Rust team! I have watched Rust for some time. One phase on its website summaries this language rather well: “zero-cost abstraction”. I was attracted to Rust by Poss’s article: Rust for functional programmers. But the more I know the language, the more I find that its syntax-level similarity to Haskell and ML is only superficial, what Rust really wants to be is a system programming language with safe and fine-granularity memory management.

C has “zero-cost”, however it is hard to build high-level abstractions in it. If we program concurrency in C, we shall explicitly use a thread library (e.g. OpenMP, PThreads) or we need to go lower to implement a thread using more primitive functions at the OS level.

On the other extreme, languages such as Haskell and Scala have “abstractions”. They can build beautiful APIs that others can use and don’t care about how they are implemented. There is a cost in abstraction. The memory usage of Haskell and Scala is harder to predict than C programs. When we chain functions such as map/filter, we don’t know exactly how many objects are created as immediate objects; it depends the underlying library implementation and the optimization ability of the compiler and these two are not common knowledge to the applications programmer . The extra cost also comes from the constant hidden inside the big O – using the iterator interface would has extra cost on each .next() function call while iterating a bare array in C is much faster.

How is it possible for Rust to have the zero-cost abstraction? Basically it achieves it through memory management system. This is the real innovative part of Rust, though some theory is built in academic papers and previous small languages, Rust is the first to provide a best engineering on it. In F#, we can easily write

col |> Seq.map (fun e -> e*e) |> Seq.filter (fun e -> e%3==0) |> Seq.sum

and we don’t know how memory is managed. Will map and filter create many small objects to be GCed? The F# program can ignore these questions. He can of course dig into Seq module’s source code and the final compiled IL code to know the details. But by default, the F# programmer need not care.

If the same program is written in Rust, the programmer has to control exactly how each object is created. He knows it by writing extra annotations to the code. This adds burden to the programmer’s mind!

In data mining, writing correct code for the numerics and the algorithmic logics is already hard, how would a data miner want to put the programming issues in his mind? I would not. This is of course because I am not familiar with Rust’s borrow system. I believe after enough training, my skill can reach a state of caring less and less about memory when programming data mining applications. But in the first place, why should I? Fine memory control is not the primary issue of data mining applications. If performance is not that critical, any static language such as F# and Scala would have a fine performance. Need more performance? Code in C++, allocate all the memory deterministically, and avoid big-object copy and dynamic heap allocation when critical components are running!

11 comments:

  1. Great Post. Keep Sharing. Visit Naxa Solutions for read more about Data Mining.

    ReplyDelete
  2. Thanks a lot for all your valuable article! We are really happy about the your thoughts... SAP Training in chennai

    ReplyDelete
  3. I have been following you for a couple of months now but this is my first time commenting on a blog post. Thank you for sharing your knowledge and experience with us. Keep up the good work. Already bookmarked for future reference.

    SAP training in Chennai

    ReplyDelete
  4. All are saying the same thing repeatedly, but in your blog I had a chance to get some useful and unique information, I love your writing style very much, I would like to suggest your blog in my dude circle, so keep on updates.


    Digital Marketing Company in Chennai

    ReplyDelete
  5. This blog is having the general information. Got a creative work and this is very different one. We have to develop our creativity mind. This blog helps for this. Thank you for this blog. this is very interesting and useful.
    Web Designing Training

    ReplyDelete
  6. Funny conclusion. Not because I am not familiar also
    >I am not familiar with Rust’s borrow system
    >Need more performance? Code in C++

    ReplyDelete
  7. I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Data Mining, kindly contact us http://www.maxmunus.com/contact
    MaxMunus Offer World Class Virtual Instructor led training on Data Mining. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.

    For Free Demo Contact us:
    Name : Arunkumar U
    Email : arun@maxmunus.com
    Skype id: training_maxmunus
    Contact No.-+91-9738507310
    Company Website –http://www.maxmunus.com


    ReplyDelete
  8. Its fantatic explaintion lot of information gather it...nice article....
    seo company in Chennai

    ReplyDelete
  9. A very nice guide. I will definitely follow these tips. Thank you for sharing such detailed article. I am learning a lot from you.Nice Article!
    java training in chennai

    ReplyDelete

  10. Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

    Personal Installment Loans
    Payday Cash Advance loan
    Title Car loan
    Cash Advance Loan

    ReplyDelete