mlcp User Guide

MarkLogic Server Exporting Content from MarkLogic Server

5.5.6

Extracting a Consistent Database Snapshot

By default, when you export or copy database contents, content is extracted from the source database at multiple points in time. You get whatever is in the database when mlcp accesses a given document. If the database contents are changing while the job runs, the results are not deterministic relative to the starting time of the job. For example, if a new document is inserted into the database while an export job is running, it might or might not be included in the export.

If you require a consistent snapshot of the database contents during an export or copy, use the

-snapshot

option to force all documents to be read from the database at a consistent point in time.

The submission time of the job is used as the timestamp. Any changes to the database occurring after this time are not reflected in the output.

If a merge occurs while exporting or copying a consistent snapshot, and the merge eliminates a fragment that is subsequently accessed by the mlcp job, you may get an

XDMP-OLDSTAMP

error. If this occurs, the documents included in the same batch or task may not be included in the export/copy result. If the source database is on MarkLogic Server 7 or later, you may be able to work around this problem by setting the merge timestamp to retain fragments for a time period longer than the expected running time of the job; for details, see

Understanding and Controlling

Database Merges

in the Administrator’s Guide.

5.6

Transforming Content During Export

This section describes how to perform content transformations on MarkLogic Server when exporting documents from the database with mlcp. To fully utilize this technique, you should understand how to use the MarkLogic Connector for Hadoop. For details, see

Advanced Input Mode in the MarkLogic Connector for Hadoop Developer’s Guide.

Note: You can only use this technique for exporting documents.

The mlcp tools uses the MarkLogic Connector for Hadoop to distribute work across your

MarkLogic cluster. You can leverage the customizability of the connector to enable server-side content transformations during export by passing a connector configuration file through mlcp with the

-conf

option. Use the configuration file to put the connector in advanced mode and supply an input query and split query. Your queries can be in XQuery or Server-Side JavaScript.

For example, the following Hadoop connector configuration file uses an XQuery split query

( mapreduce.marklogic.input.splitquery

) to distribute the documents across export tasks, and an

XQuery transformation query ( mapreduce.marklogic.input.query

) that returns just the first 1000 bytes of each selected binary document.

<property>

<name>mapreduce.marklogic.input.query</name>

<value><![CDATA[

xquery version "1.0-ml";

declare namespace mlmr="http://marklogic.com/hadoop";

declare variable $mlmr:splitstart as xs:integer external;

declare variable $mlmr:splitend as xs:integer external;

MarkLogic 8—February, 2015 mlcp User Guide—Page 94

mlcp User Guide

5.5.6

Extracting a Consistent Database Snapshot

5.6

Transforming Content During Export

Key Features

Frequently Answers and Questions

What is MarkLogic Server mlcp?

What can I do with MarkLogic Server mlcp?

What are the modes of operation for MarkLogic Server mlcp?

How do I install MarkLogic Server mlcp?

What platforms are supported by MarkLogic Server mlcp?

Table of contents