mlcp User Guide

Add to My manuals
119 Pages

advertisement

mlcp User Guide | Manualzz

MarkLogic Server Exporting Content from MarkLogic Server

5.5.6

Extracting a Consistent Database Snapshot

By default, when you export or copy database contents, content is extracted from the source database at multiple points in time. You get whatever is in the database when mlcp accesses a given document. If the database contents are changing while the job runs, the results are not deterministic relative to the starting time of the job. For example, if a new document is inserted into the database while an export job is running, it might or might not be included in the export.

If you require a consistent snapshot of the database contents during an export or copy, use the

-snapshot

option to force all documents to be read from the database at a consistent point in time.

The submission time of the job is used as the timestamp. Any changes to the database occurring after this time are not reflected in the output.

If a merge occurs while exporting or copying a consistent snapshot, and the merge eliminates a fragment that is subsequently accessed by the mlcp job, you may get an

XDMP-OLDSTAMP

error. If this occurs, the documents included in the same batch or task may not be included in the export/copy result. If the source database is on MarkLogic Server 7 or later, you may be able to work around this problem by setting the merge timestamp to retain fragments for a time period longer than the expected running time of the job; for details, see

Understanding and Controlling

Database Merges

in the Administrator’s Guide.

5.6

Transforming Content During Export

This section describes how to perform content transformations on MarkLogic Server when exporting documents from the database with mlcp. To fully utilize this technique, you should understand how to use the MarkLogic Connector for Hadoop. For details, see

Advanced Input Mode in the MarkLogic Connector for Hadoop Developer’s Guide.

Note: You can only use this technique for exporting documents.

The mlcp tools uses the MarkLogic Connector for Hadoop to distribute work across your

MarkLogic cluster. You can leverage the customizability of the connector to enable server-side content transformations during export by passing a connector configuration file through mlcp with the

-conf

option. Use the configuration file to put the connector in advanced mode and supply an input query and split query. Your queries can be in XQuery or Server-Side JavaScript.

For example, the following Hadoop connector configuration file uses an XQuery split query

( mapreduce.marklogic.input.splitquery

) to distribute the documents across export tasks, and an

XQuery transformation query ( mapreduce.marklogic.input.query

) that returns just the first 1000 bytes of each selected binary document.

<property>

<name>mapreduce.marklogic.input.query</name>

<value><![CDATA[

xquery version "1.0-ml";

declare namespace mlmr="http://marklogic.com/hadoop";

declare variable $mlmr:splitstart as xs:integer external;

declare variable $mlmr:splitend as xs:integer external;

MarkLogic 8—February, 2015 mlcp User Guide—Page 94

advertisement

Key Features

  • Import documents from flat files, compressed ZIP and GZIP files
  • Export the contents of a MarkLogic Server database to flat files
  • Copy content and metadata from one MarkLogic Server database to another
  • Import or copy content into a MarkLogic Server database, applying a custom server-side transformation
  • Extract documents from an archived forest to flat files
  • Import documents from an archived forest into a live database

Frequently Answers and Questions

What is MarkLogic Server mlcp?
It is a command line tool for getting data into and out of a MarkLogic Server database.
What can I do with MarkLogic Server mlcp?
You can import documents and metadata to a database, export documents and metadata from a database, or copy documents and metadata from one database to another.
What are the modes of operation for MarkLogic Server mlcp?
It can operate in two modes: local and distributed. Local mode is the default unless you configure your environment or mlcp command line. Distributed mode requires a Hadoop installation.
How do I install MarkLogic Server mlcp?
Download mlcp from http://developer.marklogic.com/products/mlcp, unpack the mlcp distribution to a location of your choice, and optionally put the mlcp bin directory on your path.
What platforms are supported by MarkLogic Server mlcp?
In local mode, mlcp is supported on the same platforms as MarkLogic Server, including 64-bit Linux, 64-bit Windows, and Macintosh OS X. Distributed mode is only supported on 64-bit Linux.
Download PDF

advertisement

Table of contents