Monday, March 2, 2015

Node.js and Big Data


Node.js is a platform that is built on top of Chrome’s V8 javascript runtime which is ideal for building high performance, scalable and concurrent applications. The main feature of node.js is that it provides a evented, non-blocking I/O with a event loop which makes all IO operations asynchronous. 

node.js architecture



There are some NPM modules available for integrating with the Hadoop ecosystem. The table here shows the modules available for different components like HBase, HDFS, Hive, Solr and Zookeeper. Most of the modules is built on top of the Thrift or REST gateway exposed within each component. 

Node.js Hadoop Big Data Modules
Node.js is a perfect fit for I/O bound processes capable of handling a lot of concurrent requests. I have used these modules (highlighted in orange in the table above) in several use cases.