MarkLogic Connector for Hadoop
Download
| Release 1.0-2 zip package | 1.3MB | |
|---|---|---|
The MarkLogic Connector for Hadoop enables you to run Hadoop MapReduce jobs on data in a MarkLogic Server cluster. You can
- Leverage existing MapReduce and Java libraries to process MarkLogic data
- Operate on data as Documents, Nodes, or Values
- Access MarkLogic text, geospatial, value, and document structure indexes to send only the most relevant data to Hadoop for processing
- Send Hadoop Reduce results to multiple MarkLogic forests in parallel
- Rely on the connector to optimize data access (for both locality and streaming IO) across MarkLogic forests
- MarkLogic-specific implementations of the
- Hadoop
InputFormatclass for reading data from MarkLogic - Hadoop
OutputFormatclass for writing data to MarkLogic
- Hadoop
- Sample code for a variety of use cases
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-avaiability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-availabile service on top of a cluster of computers, each of which may be prone to failures.Hadoop is often used for computationally complex bulk processing and cheap offline storage of long-tail data. It provides complimentary services to MarkLogic's real-time analytics, full-text search, delivery, and updates.
Documentation
![]()
MarkLogic Connector for Hadoop Javadoc (online)
MarkLogic Connector for Hadoop Developer's Guide
Comments