The Neo4j Operations Manual v3.1

The Neo4j Operations Manual v3.1
The Neo4j Operations Manual
v3.1
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. System requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Linux installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1. Debian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
File locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2. Linux tarball installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Unix console application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Linux service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Setting the number of open files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. OS X installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1. Mac OS X installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.2. Unix console application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3. OS X service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4. Windows installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.1. Windows installer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.2. Windows console application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.3. Windows service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.4. Windows PowerShell module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
System requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Managing Neo4j on Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How do I import the module? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
How do I get help about the module? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Example usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Common PowerShell parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5. Docker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2. Neo4j editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.3. Docker configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
File descriptor limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5.4. Neo4j configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Environment variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
/conf volume. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Build a new image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.5. Neo4j Causal Cluster mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5.6. Neo4j Highly Available mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.7. User-defined procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.8. Cypher shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.9. Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6. CAPI Flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6.1. Configuring Neo4j to run on CAPI Flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Power8 System & CAPI Flash configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Neo4j Block Device Integration Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Neo4j Block Device configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.6.2. Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Store format upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Dump and load facilites of the neo4j-admin command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Changing the dbms.memory.pagecache.swapper parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.3. Admin commands for CAPI Flash . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
The format command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
The ls command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
The fsck command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
The import command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The dump command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
The rename command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1. File locations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1. Log files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.2. Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.3. Permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2. Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3. Set an initial password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4. Wait for Neo4j to start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5. Usage Data Collector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.1. Technical Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5.2. How to disable UDC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6. Configure Neo4j connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6.1. Additional options for Neo4j connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6.2. Defaults for network interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7. Install certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7.1. Certificates issued by a Certificate Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.7.2. Auto-generated certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4. Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1. Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Operational view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Application view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.2. Causal Cluster lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Discovery protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Core membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Read replica membership . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Transacting via the Raft protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Catchup protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Backup protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Read replica shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Core shutdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.3. Create a new Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Download and configure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Start the Neo4j servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Adding Core servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Adding Read replicas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.4. Seed a Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Overview of the process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Upgrading from previous version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Leaving a cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.5. Causal Cluster settings reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2. Highly Available cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1. High Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Arbiter instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Transaction propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.2. Setup and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Important configuration settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.3. Arbiter instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.4. Endpoints for status information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
The endpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.5. HAProxy for load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Configuring HAProxy for the Bolt Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Configuring HAProxy for the HTTP API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Optimizing for reads and writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Cache-based sharding with HAProxy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5. Upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1. Upgrade planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.1. Review supported upgrade paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.2. Review the Upgrade guide at neo4j.com . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.3. Apply configuration changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.4. Upgrade application code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.1.5. Upgrade custom plugins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.6. Plan disk space requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.7. Perform a test upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2. Single-instance upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.1. Upgrade from 2.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.2.2. Upgrade from 3.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3. Neo4j HA cluster upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.1. Back up the Neo4j database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.2. Shut down the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.3.3. Upgrade the master . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.4. Upgrade the slaves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3.5. Restart the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6. Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1. Introducing backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.1. Enabling backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1.2. Storage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2. Perform a backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.1. Backup commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2.2. Incremental backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3. Restore a backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3.1. Restore a single database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3.2. Restore an HA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3.3. Restore a Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.4. Backup a Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.1. Read replica backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4.2. Core server backups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
7. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1. Authentication and authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Native auth provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
LDAP auth provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Custom-built plugin auth providers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.3. Enabling authentication and authorization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.4. Native user and role management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Native roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Custom roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Propagate users and roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Procedures for native user and role management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1.5. Integration with LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Configure the LDAP auth provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Use 'ldapsearch' to verify the configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
The auth cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Available methods of encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
Use a self-signed certificate in a test environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.1.6. Subgraph access control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Manage the custom role . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Configure procedure permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2. Security checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8. Monitoring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.1. Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.1.1. Enable metrics logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Graphite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
CSV files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.1.2. Available metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
General-purpose metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Metrics specific to Causal Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.2. Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2.1. Query logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Log configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2.2. Security events logging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Log configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.3. Query management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.3.1. Transaction timeout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.3.2. Procedures for query management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
List all running queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Terminate multiple queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Terminate a single query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.4. Procedures for monitoring a Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.4.1. Find out the role of a cluster member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.4.2. Gain an overview over the instances in the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.4.3. Get routing recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9. Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.1. Memory tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.1.1. OS memory sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.1.2. Page cache sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.1.3. Heap sizing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.1.4. Tuning of the garbage collector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.2. Transaction logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.3. Compressed storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.4. Linux file system tuning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.5. Disks, RAM and other tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
10. Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.1. Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
10.1.1. CSV file header format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
ID spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10.1.2. Command line usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
Verbose error information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
Output and statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
10.2. Cypher Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.2.1. Invoking Cypher Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
10.2.2. Query parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2.3. Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2.4. Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.3. Dump and load databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
10.4. Consistency checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.4.1. Check database consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.4.2. Provide additional configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Appendix A: Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
A.1. Configuration settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
A.2. Built-in procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.2.1. General-purpose procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.2.2. Procedures for native user and role management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.3. User management for Community Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.3.1. List all users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.3.2. Change the current user’s password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.3.3. Show details for the current user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
A.3.4. Add a user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.3.5. Delete a user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
Appendix B: Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
B.1. Set up a local Causal Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
B.1.1. Download and configure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
B.1.2. Configure the Core instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Minimum configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Additional configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
B.1.3. Start the Neo4j servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
B.1.4. Check the status of the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
B.1.5. Test the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
B.1.6. Configure the Read replicas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Minimum configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Additional configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
B.1.7. Test the cluster with Read replicas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
B.2. Set up a Highly Available cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
B.2.1. Download and configure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
B.2.2. Start the Neo4j Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B.3. Set up a local HA cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
B.3.1. Download and configure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
Start the Neo4j Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.4. Use the Import tool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.4.1. Basic example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
B.4.2. Customizing configuration options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
B.4.3. Using separate header files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
B.4.4. Multiple input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
B.4.5. Types and labels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Using the same label for every node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Using the same relationship type for every relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
B.4.6. Property types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
B.4.7. ID handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Working with sequential or auto incrementing identifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
B.4.8. Bad input data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Relationships referring to missing nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
Multiple nodes with same id within same id space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
B.5. Scenarios for using role-based access control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
B.5.1. Creating a user and managing roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
B.5.2. Suspending and reactivating a user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
© 2017 Neo Technology
License: Creative Commons 3.0
This is the operations manual for Neo4j version 3.1, authored by the Neo4j Team.
The manual covers the following areas:
• Introduction — Introduction of Neo4j Community and Enterprise Editions.
• Installation — Instructions on how to install Neo4j in different deployment
contexts.
• Configuration — Instructions on how to configure certain parts of the product.
• Clustering — Introduction to the clustering solutions available within Neo4j,
followed by deployment details.
• Upgrade — Instructions on upgrading Neo4j.
• Backup — Instructions on setting up Neo4j backups.
• Monitoring — Instructions on setting up Neo4j monitoring.
• Security — Instructions on user management, role-based access control, and
server security.
• Performance — Instructions on how to go about performance tuning for Neo4j.
• Tools — Description of Neo4j tools.
• Reference — Listings of all Neo4j configuration parameters.
• Tutorial — Step-by-step instructions on various scenarios for setting up Neo4j.
Who should read this?
This manual is written for:
• the engineer performing the Neo4j production deployment.
• the operations engineer supporting and maintaining the Neo4j production
database.
• the enterprise architect investigating database options.
• the infrastructure architect planning the Neo4j production deployment.
1
Chapter 1. Introduction
This chapter introduces Neo4j.
Neo4j is the world’s leading graph database. It is built from the ground up to be a graph database,
meaning that its architecture is designed for optimizing fast management, storage, and traversal of
nodes and relationships. Therefore, relationships are said to be first class citizens in Neo4j. An
operation known in the relational database world as a join exhibits performance which degrades
exponentially with the number of relationships. The corresponding action in Neo4j is performed as
navigation from one node to another; an operation whose performance is linear.
This different approach to storing and querying connections between entities provides traversal
performance of up to four million hops per second and core. Since most graph searches are local to
the larger neighborhood of a node, the total amount of data stored in a database will not affect
operations runtime. Dedicated memory management, and highly scalable and memory efficient
operations, contribute to the benefits.
The property graph approach is whiteboard friendly. By this we mean that the schema optional model
of Neo4j provides for a consistent use of the same model throughout conception, design,
implementation, storage, and visualization. A major benefit of this is that it allows all business
stakeholders to participate throughout the development cycle. Also, the domain model can be evolved
continuously as requirements change, without the penalty of expensive schema changes and
migrations.
Cypher, the declarative graph query language, is designed to visually represent graph patterns of
nodes and relationships. This highly capable, yet easily readable, query language is centered around
the patterns that express concepts and questions from a specific domain. Cypher can also be
extended for narrow optimizations for specific use cases.
Neo4j can store hundreds of trillions of entities for the largest datasets imaginable while being
sensitive to compact storage. For production environments it can be deployed as a scalable, faulttolerant cluster of machines. Due to its high scalability, Neo4j clusters require only tens of machines,
not hundreds or thousands, saving on cost and operational complexity. Other features for production
applications include hot backups and extensive monitoring.
There are two editions of Neo4j to choose from: Community Edition and Enterprise Edition:
Community Edition is a fully functional edition of Neo4j, suitable for single instance deployments. It
has full support for key Neo4j features, such as ACID compliance, Cypher, and programming APIs. It is
ideal for learning Neo4j, for do-it-yourself projects, and for applications in small workgroups.
Enterprise Edition extends the functionality of Community Edition to include key features for
performance and scalability, such as a clustering architecture for high availability and online backup
functionality. Additional security features include role-based access control and LDAP support; for
example, Active Directory. It is the choice for production systems with requirements for scale and
availability, such as commercial solutions and critical internal solutions.
Table 1. Features
Edition
Enterprise
Community
Property Graph Model
X
X
Native Graph Processing & Storage
X
X
ACID
X
X
Cypher - Graph Query Language
X
X
Language Drivers
X
X
2
Edition
Enterprise
Community
Extensible REST API
X
X
High-Performance Native API
X
X
HTTPS
X
X
Role-based access control
X
-
Subgraph access control
X
-
LDAP support
X
-
Listing/terminating running queries
X
-
Enterprise
Community
Enterprise Lock Manager
X
-
Clustering
X
-
Hot Backups
X
-
Advanced Monitoring
X
-
Table 2. Performance & Scalability
Edition
3
Chapter 2. Installation
This chapter describes installation of Neo4j in different deployment contexts, such as Linux,
Mac OS X, Windows, Debian, Docker, and with CAPI Flash.
Neo4j runs on Linux, Windows and Mac OS X. There are desktop installers for Community Edition
available for Mac OS X and Windows. There are also platform-specific packages and zip/tar archives of
both Community Edition and Enterprise editions.
The topics described are:
• System requirements — The system requirements for a production deployment of Neo4j.
• Linux — Installation instructions for Linux.
• Mac OS X — Installation instructions for Mac OS X.
• Windows — Installation instructions for Windows.
• Docker — Installation instructions for Docker.
• CAPI Flash — Installation instructions for CAPI Flash.
2.1. System requirements
This section provides an overview of the system requirements for running a Neo4j instance.
CPU
Performance is generally memory or I/O bound for large graphs, and compute bound for graphs that
fit in memory.
Minimum
Intel Core i3
Recommended
Intel Core i7
IBM POWER8
Memory
More memory allows for larger graphs, but it needs to be configured properly to avoid disruptive
garbage collection operations. See Memory tuning for suggestions.
Minimum
2GB
Recommended
16—32GB or more
Disk
Aside from capacity, the performance characteristics of the disk are the most important when
selecting storage. Neo4j workloads tend significantly toward random reads. Select media with low
4
average seek time: SSD over spinning disks. Consult Disks, RAM and other tips for more details.
Minimum
10GB SATA
Recommended
SSD w/ SATA
Filesystem
For proper ACID behavior, the filesystem must support flush (fsync, fdatasync). See Linux file system
tuning for a discussion on how to configure the filesystem in Linux for optimal performance.
Minimum
ext4 (or similar)
Recommended
ext4, ZFS
Software
Neo4j requires a Java Virtual Machine, JVM, to operate. Community Edition installers for Windows and
Mac include a JVM for convenience. All other distributions, including all distributions of Neo4j
Enterprise Edition, require a pre-installed JVM.
Java
OpenJDK 8 (http://openjdk.java.net/) or Oracle Java 8
(http://www.oracle.com/technetwork/java/javase/downloads/index.html)
IBM Java 8 (http://www.ibm.com/developerworks/java/jdk/)
Operating Systems
Linux (Ubuntu, Debian)
Windows Server 2012
Architectures
x86
OpenPOWER (POWER8)
2.2. Linux installation
Install Neo4j on Linux from a tarball and or a package.
For installing Neo4j on Linux you can use the Debian package or install from a tarball.
• Debian
• Linux tarball installation
2.2.1. Debian
This article covers deploying Neo4j on Debian and Debian-based distributions like Ubuntu
using the Neo4j Debian package.
5
Installation
To install Neo4j on Debian you need to make sure of the following:
• A Java 8 runtime is installed.
• The repository containing the Neo4j Debian package is known to the package manager.
Prerequisites (Ubuntu 14.04 and Debian 8 only)
Neo4j 3.1 requires the Java 8 runtime. Java 8 is not included in Ubuntu 14.04 LTS or Debian 8 (jessie)
and will have to be installed manually prior to installing or upgrading to Neo4j 3.1. Debian users can
find OpenJDK 8 in backports (https://packages.debian.org/jessie-backports/openjdk-8-jdk).
Java 8 on Debian 8
Add the line deb http://httpredir.debian.org/debian jessie-backports main to a file with the ".list"
extension in /etc/apt/sources.list.d/. Then do apt-get update.
echo "deb http://httpredir.debian.org/debian jessie-backports main" | sudo tee -a
/etc/apt/sources.list.d/jessie-backports.list
sudo apt-get update
You are now ready to install Neo4j 3.1.2 (which will install Java 8 automatically if it is not already
installed). See Dealing with multiple installed Java versions to make sure you can start Neo4j after
install.
Java 8 on Ubuntu 14.04
Users on Ubuntu 14.04 can add oracle Java 8 via webupd8. Note that when installing from webupd8 or
any other PPA, you must install Java 8 manually before installing Neo4j. Otherwise there is a risk that
Java 9 will be installed in, which is not compatible with Neo4j.
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
Once installed, see Dealing with multiple installed Java versions to make sure you can start Neo4j after
install.
Dealing with multiple installed Java versions
It is important that you configure your default Java version to point to Java 8, or Neo4j 3.1.2 will be
unable to start. Do so with the update-java-alternatives command.
First list all your installed version of Java with update-java-alternatives --list
Your result may vary, but this is an example of the output:
java-1.7.0-openjdk-amd64 1071 /usr/lib/jvm/java-1.7.0-openjdk-amd64
java-1.8.0-openjdk-amd64 1069 /usr/lib/jvm/java-1.8.0-openjdk-amd64
Identify your Java 8 version, in this case it is java-1.8.0-openjdk-amd64. Then set it as the default with
(replacing <java8name> with the appropriate name from above)
6
sudo update-java-alternatives --jre --set <java8name>
Add the repository
The Debian package is available from http://debian.neo4j.org. To use the repository follow these
steps:
wget -O - https://debian.neo4j.org/neotechnology.gpg.key | sudo apt-key add echo 'deb http://debian.neo4j.org/repo stable/' | sudo tee -a /etc/apt/sources.list.d/neo4j.list
sudo apt-get update
Installing
To install Neo4j Community Edition:
sudo apt-get install neo4j=3.1.2
To install Neo4j Enterprise Edition:
sudo apt-get install neo4j-enterprise=3.1.2
Upgrade
For upgrade of any 3.x version of Neo4j to 3.1.2, follow instructions in Upgrade.
Below is a description of the steps necessary for upgrade of Neo4j 2.3 to 3.1.2.
Upgrade from Neo4j 2.3
There are three steps involved in upgrading a Neo4j Debian/Ubuntu installation. First, the
configuration files need to be migrated. Then the database must be imported. Finally, the database
store format must be upgraded.
Configuration files
The configuration files have changed in 3.1.2. If you have not edited the configuration files, the Debian
package will simply remove the files that are no longer necessary, and replace the old default files with
new default files.
If you have changed configuration values in your 2.3 installation, you can use the config migration tool
provided. Two arguments are provided to tell the config migrator where to find the "conf" directory for
the source and the destination. Both must be provided, due to the filesystem layout of the Debian
packages.
Because the Neo4j files and directories are owned by the neo4j user and adm group on Debian, it is
necessary to use sudo to make sure the permissions remain intact:
sudo -u neo4j -g adm java -jar /usr/share/neo4j/bin/tools/2.x-config-migrator.jar /var/lib/neo4j
/var/lib/neo4j
Importing the 2.3 database to 3.1.2
The location of databases has changed in 3.1.2. 2.3 databases will need to be imported to 3.1.2. To do
7
this, the neo4j-admin import command can be used. To import a database called graph.db (the default
database name in 2.3) use the following command:
sudo -u neo4j neo4j-admin import --mode=database --database=graph.db --from=/var/lib/neo4j/data/graph.db
This command will import the database located in /var/lib/neo4j/data/graph.db into 3.1.2 and call it
graph.db.
Once a database has been imported, and the upgrade has completed successfully, the old database
can be removed safely.
Migrating the 2.3 database to 3.1.2
The previous import step moved the database from its old on-disk location to the new on-disk
location, but it did not upgrade the store format. To do this, you must start the database service with
the option to migrate the database format to the latest version.
In neo4j.conf uncomment the option dbms.allow_format_migration=true. You can use the following
command line to change the line in-place if you have it commented out already, as it is in the default
configuration:
sudo sed -i 's/#dbms.allow_format_migration=true/dbms.allow_format_migration=true/' /etc/neo4j/neo4j.conf
Start the database service with the format migration option enabled, and the format migration will
take place immediately.
sudo service neo4j start
File locations
File locations for all Neo4j packages are documented here.
Operation
Most Neo4j configuration goes into neo4j.conf. Some package-specific options are set in
/etc/default/neo4j.
Environment variable
Default value
Details
NEO4J_SHUTDOWN_TIMEOUT
120
Timeout in seconds when waiting for
Neo4j to stop. If it takes longer than
this then the shutdown is considered
to have failed. This may need to be
increased if the system serves longrunning transactions.
NEO4J_ULIMIT_NOFILE
60000
Maximum number of file handles that
can be opened by the Neo4j process.
See this page for details.
2.2.2. Linux tarball installation
Install Neo4j on Linux from a tarball and run as a console application or service.
8
Unix console application
1. Download the latest release from http://neo4j.com/download/.
• Select the appropriate tar.gz distribution for your platform.
2. Extract the contents of the archive, using: tar -xf <filename>
• Refer to the top-level extracted directory as: NEO4J_HOME
3. Change directory to: $NEO4J_HOME
• Run: ./bin/neo4j console
4. Stop the server by typing Ctrl-C in the console.
Linux service
The neo4j command can also be used with start, stop, restart or status instead of console. By using
these actions, you can create a Neo4j service.

This approach to running Neo4j as a service is deprecated. We strongly advise you
to run Neo4j from a package where feasible.
You can build your own init.d script. See for instance the Linux Standard Base specification on
system initialization (http://refspecs.linuxfoundation.org/LSB_3.1.0/LSB-Core-generic/LSB-Coregeneric/tocsysinit.html), or one of the many samples (https://gist.github.com/chrisvest/7673244) and tutorials
(http://www.linux.com/learn/tutorials/442412-managing-linux-daemons-with-init-scripts).
Setting the number of open files
Linux platforms impose an upper limit on the number of concurrent files a user may have open. This
number is reported for the current user and session with the ulimit -n command:
[email protected]:~$ ulimit -n
1024
The usual default of 1024 is often not enough. This is especially true when many indexes are used or a
server installation sees too many connections. Network sockets count against the limit as well. Users
are therefore encouraged to increase the limit to a healthy value of 40 000 or more, depending on
usage patterns. It is possible to set the limit with the ulimit command, but only for the root user, and
it only affects the current session. To set the value system wide, follow the instructions for your
platform.
What follows is the procedure to set the open file descriptor limit to 40 000 for user neo4j under
Ubuntu 10.04 and later.

If you opted to run the neo4j service as a different user, change the first field in step
2 accordingly.
1. Become root, since all operations that follow require editing protected system files.
[email protected]:~$ sudo su Password:
[email protected]:~$
2. Edit /etc/security/limits.conf and add these two lines:
9
neo4j
neo4j
soft
hard
nofile
nofile
40000
40000
3. Edit /etc/pam.d/su and uncomment or add the following line:
session
required
pam_limits.so
4. A restart is required for the settings to take effect.
After the above procedure, the neo4j user will have a limit of 40 000 simultaneous open files. If
you continue experiencing exceptions on Too many open files or Could not stat() directory, you
may have to raise the limit further.
2.3. OS X installation
Install Neo4j on OS X with a desktop installer or from a tarball. Run it as a desktop or
console application, or as a service.
2.3.1. Mac OS X installer
1. Download the .dmg installer that you want from http://neo4j.com/download/.
2. Click the downloaded installer file.
3. Drag the Neo4j icon into the Applications folder.

If you install Neo4j using the Mac installer and already have an existing instance of
Neo4j the installer will ensure that both the old and new versions can co-exist on
your system.
2.3.2. Unix console application
1. Download the latest release from http://neo4j.com/download/.
• Select the appropriate tar.gz distribution for your platform.
2. Extract the contents of the archive, using: tar -xf <filename>
• Refer to the top-level extracted directory as: NEO4J_HOME
3. Change directory to: $NEO4J_HOME
• Run: ./bin/neo4j console
4. Stop the server by typing Ctrl-C in the console.
When Neo4j runs in console mode in the foreground logs are printed to the Terminal.
2.3.3. OS X service
Use the standard OS X system tools to create a service based on the neo4j command.
10
2.4. Windows installation
Install Neo4j on Windows with a desktop installer or from a ZIP archive. Run it as a desktop
or console application, or as a Windows service.
2.4.1. Windows installer
1. Download the version that you want from http://neo4j.com/download/.
• Select the appropriate version and architecture for your platform.
2. Double-click the downloaded installer file.
3. Follow the prompts.


The installer will prompt to be granted Administrator privileges. Newer versions of
Windows come with a SmartScreen feature that may prevent the installer from
running — you can make it run anyway by clicking "More info" on the "Windows
protected your PC" screen.
If you install Neo4j using the windows installer and you already have an existing
instance of Neo4j the installer will select a new install directory by default. If you
specify the same directory it will ask if you want to upgrade. This should proceed
without issue although some users have reported a JRE is damaged error. If you see
this error simply install Neo4j into a different location.
2.4.2. Windows console application
1. Download the latest release from http://neo4j.com/download/.
• Select the appropriate Zip distribution.
2. Right-click the downloaded file, click Extract All.
3. Change directory to top-level extracted directory.
• Run bin\neo4j console
4. Stop the server by typing Ctrl-C in the console.
2.4.3. Windows service
Neo4j can also be run as a Windows service. Install the service with bin\neo4j install-service and
start it with bin\neo4j start. Other commands available are stop, restart, status and uninstallservice.
2.4.4. Windows PowerShell module
The Neo4j PowerShell module allows administrators to:
• install, start and stop Neo4j Windows® Services
• and start tools, such as Neo4j Shell and Neo4j Import.
The PowerShell module is installed as part of the ZIP file (http://neo4j.com/download/) distributions of
Neo4j.
11
System requirements
• Requires PowerShell v2.0 or above.
• Supported on either 32 or 64 bit operating systems.
Managing Neo4j on Windows
On Windows it is sometimes necessary to Unblock a downloaded zip file before you can import its
contents as a module. If you right-click on the zip file and choose "Properties" you will get a dialog.
Bottom-right on that dialog you will find an "Unblock" button. Click that. Then you should be able to
import the module.
Running scripts has to be enabled on the system. This can for example be achieved by executing the
following from an elevated PowerShell prompt:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned
For more information see About execution policies (https://technet.microsoft.com/enus/library/hh847748.aspx).
The powershell module will display a warning if it detects that you do not have administrative rights.
How do I import the module?
The module file is located in the bin directory of your Neo4j installation, i.e. where you unzipped the
downloaded file. For example, if Neo4j was installed in C:\Neo4j then the module would be imported
like this:
Import-Module C:\Neo4j\bin\Neo4j-Management.psd1
This will add the module to the current session.
Once the module has been imported you can start an interactive console version of a Neo4j Server
like this:
Invoke-Neo4j console
To stop the server, issue Ctrl-C in the console window that was created by the command.
How do I get help about the module?
Once the module is imported you can query the available commands like this:
Get-Command -Module Neo4j-Management
The output should be similar to the following:
CommandType
----------Function
Function
Function
Function
Function
Name
---Invoke-Neo4j
Invoke-Neo4jAdmin
Invoke-Neo4jBackup
Invoke-Neo4jImport
Invoke-Neo4jShell
Version
------3.0.0
3.0.0
3.0.0
3.0.0
3.0.0
12
Source
-----Neo4j-Management
Neo4j-Management
Neo4j-Management
Neo4j-Management
Neo4j-Management
The module also supports the standard PowerShell help commands.
Get-Help Invoke-Neo4j
To see examples for a command, do like this:
Get-Help Invoke-Neo4j -examples
Example usage
• List of available commands:
Invoke-Neo4j
• Current status of the Neo4j service:
Invoke-Neo4j status
• Install the service with verbose output:
Invoke-Neo4j install-service -Verbose
• Available commands for administrative tasks:
Invoke-Neo4jAdmin
Common PowerShell parameters
The module commands support the common PowerShell parameter of Verbose.
2.5. Docker
This article covers running Neo4j in a Docker container.
Docker does not run natively on OS X or Windows. For running Docker on OS X
(https://docs.docker.com/engine/installation/mac/) and Windows
(https://docs.docker.com/engine/installation/windows/) please consult the Docker documentation.
2.5.1. Overview
By default the Docker image exposes three ports for remote access:
• 7474 for HTTP.
• 7473 for HTTPS.
• 7687 for Bolt.
It also exposes two volumes:
13
• /data to allow the database to be persisted outside its container.
• /logs to allow access to Neo4j log files.
docker run \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
--volume=$HOME/neo4j/logs:/logs \
neo4j:3.1
Point your browser at http://localhost:7474 on Linux or http://$(docker-machine ip default):7474
on OS X.
All the volumes in this documentation are stored under $HOME in order to work on OS X where $HOME is
automatically mounted into the machine VM. On Linux the volumes can be stored anywhere.

By default Neo4j requires authentication and requires you to login with neo4j/neo4j
at the first connection and set a new password. You can set the password for the
docker container directly by specifying --env NEO4J_AUTH=neo4j/<password> in your
run directive. Alternatively, you can disable authentication by specifying --env
NEO4J_AUTH=none instead.
2.5.2. Neo4j editions
Tags are available for both Neo4j Community and Enterprise editions. Version-specific Enterprise
Edition tags have an -enterprise suffix, for example neo4j:3.1.0-enterprise. Community Edition tags
have no suffix, for example neo4j:3.1.0. The latest Neo4j Enterprise Edition release is available as
neo4j:enterprise.
2.5.3. Docker configuration
File descriptor limit
Neo4j may use a large number of file descriptors if many indexes are in use or there is a large number
of simultaneous database connections.
Docker controls the number of open file descriptors in a container; the limit depends on the
configuration of your system. We recommend a limit of at least 40000 for running Neo4j.
To check the limit on your system, run this command:
docker run neo4j:3.1 \
bash -c 'echo Soft limit: $(ulimit -Sn); echo Hard limit: $(ulimit -Hn)'
To override the default configuration for a single container, use the --ulimit option like this:
docker run \
--detach \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
--volume=$HOME/neo4j/logs:/logs \
--ulimit=nofile=40000:40000
neo4j:3.1
14
2.5.4. Neo4j configuration
The default configuration provided by this image is intended for learning about Neo4j, but must be
modified to make it suitable for production use. In particular the memory assigned to Neo4j is very
limited (see NEO4J_CACHE_MEMORY and NEO4J_HEAP_MEMORY below), to allow multiple containers to be run
on the same server. You can read more about configuring Neo4j in the Configuration settings.
There are three ways to modify the configuration:
• Set environment variables.
• Mount a /conf volume.
• Build a new image.
Which one to choose depends on how much you need to customize the image.
Environment variables
Pass environment variables to the container when you run it.
docker run \
--detach \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
--volume=$HOME/neo4j/logs:/logs \
--env=NEO4J_dbms_memory_pagecache_size=4G \
neo4j:3.1
The following environment variables are available:
• NEO4J_AUTH: controls authentication, set to none to disable authentication or neo4j/<password> to
override the default password (see Security for details).
• NEO4J_dbms_memory_pagecache_size: the size of Neo4j’s native-memory cache, defaults to 512M
• NEO4J_dbms_memory_heap_maxSize: the size of Neo4j’s heap, defaults to 512M
• NEO4J_dbms_txLog_rotation_retentionPolicy: the retention policy for logical logs, defaults to 100M
size
• NEO4J_dbms_allowFormatMigration: set to true to enable upgrades, defaults to false (see the Singleinstance upgrade for details)
Neo4j Enterprise Edition
The following settings control features that are only available in the Enterprise Edition of Neo4j.
• NEO4J_dbms_mode: the database mode, defaults to SINGLE, set to CORE or READ_REPLICA for Causal
Clustering, set to HA for Highly Available clusters.
Causal Cluster settings
• NEO4J_causalClustering_expectedCoreClusterSize: the initial cluster size (number of Core
instances) at startup.
• NEO4J_causalClustering_initialDiscoveryMembers: the network addresses of an initial set of Core
cluster members.
• NEO4J_causalClustering_discoveryAdvertisedAddress: hostname/ip address and port to advertise
for member discovery management communication.
• NEO4J_causalClustering_transactionAdvertisedAddress: hostname/ip address and port to
15
advertise for transaction handling.
• NEO4J_causalClustering_raftAdvertisedAddress: hostname/ip address and port to advertise for
cluster communication.
See below for examples of how to configure Causal Clustering.
Highly Available cluster settings
• NEO4J_ha_serverId: the id of the server, must be unique within a cluster
• NEO4J_ha_host_coordination: the address (including port) used for cluster coordination in HA
mode, this must be resolvable by all cluster members
• NEO4J_ha_host_data: the address (including port) used for data transfer in HA mode, this must be
resolvable by all cluster members
• NEO4J_ha_initialHosts: comma-separated list of other members of the cluster
See below for an example of how to configure HA clusters.
/conf volume
To make arbitrary modifications to the Neo4j configuration, provide the container with a /conf volume.
docker run \
--detach \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
--volume=$HOME/neo4j/logs:/logs \
--volume=$HOME/neo4j/conf:/conf \
neo4j:3.1
Any configuration files in the /conf volume will override files provided by the image. This includes
values that may have been set in response to environment variables passed to the container by
Docker. So if you want to change one value in a file you must ensure that the rest of the file is
complete and correct.
To dump an initial set of configuration files, run the image with the dump-config command.
docker run --rm\
--volume=$HOME/neo4j/conf:/conf \
neo4j:3.1 dump-config
Build a new image
For more complex customization of the image you can create a new image based on this one.
FROM neo4j:3.1
If you need to make your own configuration changes, we provide a hook so you can do that in a script:
COPY extra_conf.sh /extra_conf.sh
Then you can pass in the EXTENSION_SCRIPT environment variable at runtime to source the script:
docker run -e "EXTENSION_SCRIPT=/extra_conf.sh" cafe12345678
16
When the extension script is sourced, the current working directory will be the root of the Neo4j
installation.
2.5.5. Neo4j Causal Cluster mode
This feature is available in Neo4j Enterprise Edition.
In order to run Neo4j in CC mode under Docker you need to wire up the containers in the cluster so
that they can talk to each other. Each container must have a network route to each of the others and
the NEO4J_causalClustering_expectedCoreClusterSize and
NEO4J_causalClustering_initialDiscoveryMembers environment variables must be set for cores. Read
replicas only need to define NEO4J_causalClustering_initialDiscoveryMembers.
Within a single Docker host, this can be achieved as follows.
docker network create --driver=bridge cluster
docker run --name=core1 --detach --network=cluster \
--publish=7474:7474 --publish=7687:7687 \
--env=NEO4J_dbms_mode=CORE \
--env=NEO4J_causalClustering_expectedCoreClusterSize=3 \
--env=NEO4J_causalClustering_initialDiscoveryMembers=core1:5000,core2:5000,core3:5000 \
neo4j:3.1-enterprise
docker run --name=core2 --detach --network=cluster \
--env=NEO4J_dbms_mode=CORE \
--env=NEO4J_causalClustering_expectedCoreClusterSize=3 \
--env=NEO4J_causalClustering_initialDiscoveryMembers=core1:5000,core2:5000,core3:5000 \
neo4j:3.1-enterprise
docker run --name=core3 --detach --network=cluster \
--env=NEO4J_dbms_mode=CORE \
--env=NEO4J_causalClustering_expectedCoreClusterSize=3 \
--env=NEO4J_causalClustering_initialDiscoveryMembers=core1:5000,core2:5000,core3:5000 \
neo4j:3.1-enterprise
Additional instances can be added to the cluster in an ad-hoc fashion. A read replica can for example
be added with:
docker run --name=read_replica1 --detach --network=cluster \
--env=NEO4J_dbms_mode=READ_REPLICA \
--env=NEO4J_causalClustering_initialDiscoveryMembers=core1:5000,core2:5000,core3:5000 \
neo4j:3.1-enterprise
When each container is running on its own physical machine and docker network is not used, it is
necessary to define the advertised addresses to enable communication between the physical
machines. Each instance would then be invoked similar to:
docker run --name=neo4j-core --detach \
--publish=7474:7474 --publish=7687:7687 \
--publish=5000:5000 --publish=7000:7000 \
--env=NEO4J_dbms_mode=CORE \
--env=NEO4J_causalClustering_expectedCoreClusterSize=3 \
--env=NEO4J_causalClustering_initialDiscoveryMembers=<core1-public-address>:5000,<core2-publicaddress>:5000,<core3-public-address>:5000 \
--env=NEO4J_causalClustering_discoveryAdvertisedAddress=<public-address>:5000 \
--env=NEO4J_causalClustering_transactionAdvertisedAddress=<public-address>:6000 \
--env=NEO4J_causalClustering_raftAdvertisedAddress=<public-address>:7000 \
neo4j:3.1-enterprise
Where <public-address> is the public hostname or ip-address of the machine.
See Create a new Causal Cluster for more details of Neo4j Casual Clustering.
17
2.5.6. Neo4j Highly Available mode
This feature is available in Neo4j Enterprise Edition.
In order to run Neo4j in HA mode under Docker you need to wire up the containers in the cluster so
that they can talk to each other. Each container must have a network route to each of the others and
the NEO4J_ha_host_coordination, NEO4J_ha_host_data and NEO4J_ha_initialHosts environment
variables must be set accordingly (see above).
Within a single Docker host, this can be achieved as follows.
docker network create --driver=bridge cluster
docker run --name=instance1 --detach --publish=7474:7474 --publish=7687:7687 --net=cluster --hostname
=instance1 \
--volume=$HOME/neo4j/logs1:/logs \
--env=NEO4J_dbms_mode=HA --env=NEO4J_ha_serverId=1 \
--env=NEO4J_ha_host_coordination=instance1:5001 --env=NEO4J_ha_host_data=instance1:6001 \
--env=NEO4J_ha_initialHosts=instance1:5001,instance2:5001,instance3:5001 \
neo4j:3.1-enterprise
docker run --name=instance2 --detach --publish 7475:7474 --publish=7688:7687 --net=cluster --hostname
=instance2 \
--volume=$HOME/neo4j/logs2:/logs \
--env=NEO4J_dbms_mode=HA --env=NEO4J_ha_serverId=2 \
--env=NEO4J_ha_host_coordination=instance2:5001 --env=NEO4J_ha_host_data=instance2:6001 \
--env=NEO4J_ha_initialHosts=instance1:5001,instance2:5001,instance3:5001 \
neo4j:3.1-enterprise
docker run --name=instance3 --detach --publish 7476:7474 --publish=7689:7687 --net=cluster --hostname
=instance3 \
--volume=$HOME/neo4j/logs3:/logs \
--env=NEO4J_dbms_mode=HA --env=NEO4J_ha_serverId=3 \
--env=NEO4J_ha_host_coordination=instance3:5001 --env=NEO4J_ha_host_data=instance3:6001 \
--env=NEO4J_ha_initialHosts=instance1:5001,instance2:5001,instance3:5001 \
neo4j:3.1-enterprise
See the Set up a Highly Available cluster for more details of Neo4j Highly Available mode.
2.5.7. User-defined procedures
To install user-defined procedures, provide a /plugins volume containing the jars.
docker run --publish 7474:7474 --publish=7687:7687 --volume=$HOME/neo4j/plugins:/plugins neo4j:3.1
See Developer Manual → Procedures (http://neo4j.com/docs/developer-manual/3.1/extending-neo4j/procedures/)
for more details on procedures.
2.5.8. Cypher shell
The Neo4j shell can be run locally within a container using a command like this:
docker exec --interactive --tty <container> bin/cypher-shell
2.5.9. Encryption
The Docker image can expose Neo4j’s native TLS support. To use your own key and certificate, provide
an /ssl volume with the key and certificate inside. The files must be called neo4j.key and neo4j.cert. You
must also publish port 7473 to access the HTTPS endpoint.
18
docker run --publish 7473:7473 --publish=7687:7687 --volume $HOME/neo4j/ssl:/ssl neo4j:3.1
2.6. CAPI Flash
This section covers using CAPI Flash as storage for Neo4j.
Neo4j can be configured to use CAPI Flash as storage for its store files instead of the file system. CAPI
is the Coherent Accelerator Processor Interface technology from IBM, IBM, allowing a FGPA (Field
Programmable Gate Array) on a PCIe (Peripheral Component Interconnect Express) expansion card to
share a coherent view of memory with a Power8 CPU. CAPI Flash is an application of this technology to
access storage, either embedded on the CAPI card or via fiber channel to flash storage appliances.
The Neo4j CAPI Flash integration allows greater I/O throughput and better scaling for concurrent I/O
load. It also avoids double caching of the store files, which improves memory utilisation and avoids
block tearing. By extension it avoids the read-modify-write problem that can occur when file writes are
not aligned to the native block size of the underlying storage system. Together, these advantages
improve the performance of Neo4j, in particular for highly concurrent read workloads.
The Neo4j CAPI Flash integration is an extension that is fully compatible with Neo4j Enterprise Edition,
and is available on request from Neo Technology.
2.6.1. Configuring Neo4j to run on CAPI Flash
There are three main steps for configuring Neo4j to work with CAPI Flash. First, ensure that the
environment – the Power8 system and its configuration — is properly configured to give Neo4j access
to the CAPI Flash hardware. Second, the neo4j-blockdevice-*.jar file for the specific version of Neo4j
that will be run on CAPI Flash is required. Third, some configurations must be added to neo4j.conf in
order to enable CAPI Flash and specify how it should work.
Before beginning, ensure that Neo4j is not running. The configuration for Neo4j would ideally be
started on a clean installation. However, it is possible to migrate an existing Neo4j database onto CAPI
Flash storage using the neo4j-admin blockdev import command. Refer to Admin commands for CAPI
Flash for more information on how to do this.
Power8 System & CAPI Flash configuration
First, review the documentation for the CAPI Flash hardware to ensure that it is installed correctly and
that it is working. Also make sure that the CAPI Flash devices exposed through the operating system
(typically through a path like /dev/sgX where X is a number) are accessible, readable and writeable to
the user running the Neo4j database.
In a typical installation, the CAPI Flash devices will be read/write accessible to every user in the cxl
group. If Neo4j is going to run as a dedicated neo4j user, then this user can be added to the cxl group
by running a sudo usermod -a -G cxl neo4j command. Assuming the CAPI Flash software has been
installed in the /opt/ibm/capikv directory, a user in the cxl group will be able to inspect what devices
are available:
19
root:~$ /opt/ibm/capikv/bin/cxlfstatus
CXL Flash Device Status
Found 0601 0000:01:00.0 U78CB.001.WZS054X-P1-C7
Device:
SCSI Block
Mode
LUN WWID
sg5:
1:0:0:0,
sdb, superpipe, 60025380025382463300054000000000
sg6:
1:1:0:0,
sdc, superpipe, 60025380025382463300050000000000
Found 0601 0005:01:00.0 U78CB.001.WZS054X-P1-C3
Device:
SCSI Block
Mode
LUN WWID
sg7:
2:0:0:0,
sdd, superpipe, 60025380025382463300014000000000
sg8:
2:1:0:0,
sde, superpipe, 60025380025382463300160000000000
root:~$
In the output above, we have two CAPI FlashGT cards, each with two SSDs installed. The sgX devices
can be found in the /dev directory as /dev/sgX. The Mode field can be either legacy or superpipe, and in
order for Neo4j to work on the devices, Mode must be set to superpipe.
Note that Neo4j expects to have exclusive access to the CAPI Flash devices, and that only a single
Neo4j instance will be using the devices at any point in time. Furthermore, Neo4j expects access to the
physical LUNs (Logical Unit Number). This means that virtual LUNs are not supported. Virtual LUNs is
an operational mode of CAPI Flash using CAPI Flash as an extension to RAM. This is not applicable
when CAPI Flash is used as storage, as is the case when using it with Neo4j.
Next, figure out the topology of the CAPI Flash hardware. Each of the /dev/sgX devices is in CAPI Flash
hardware terms known as a port. Each CAPI Flash card can have more than one port, and a system can
have more than one CAPI Flash card installed. Furthermore, when using fiber channel attached flash,
two different ports can refer to the same Logical Block Address (LBA) space. We will go through how to
make Neo4j take advantage of more than one device in Neo4j Block Device configuration. This feature
is a big contributor to the high concurrent I/O throughput that CAPI Flash offers.
Finally, the CAPI Flash block library needs to be installed on your system and available for use by
Neo4j. This is usually a file called libcflsh_block.so and is typically found in a /opt/ibm/capikv/lib
directory. The library may have to be readable and executable by the Neo4j user. The exact minimal
working set of permissions depends on your setup, but in most cases, it is safe to mark the library as
readable and executable to everyone.
Neo4j Block Device Integration Library
The Neo4j Block Device Integration Library is distributed as a neo4j-blockdevice-VERSION.jar file,
where the VERSION is composed of a target Neo4j version, e.g. 3.1.0, and a stitch version digit, such as
neo4j-blockdevice-3.1.0.0.jar. The stitch version allows more than one version of the block device
integration library to be released for a given version of Neo4j. The library is only compatible with the
given specific version of Neo4j. The integration library jar file is placed in the <neo4j-home>/lib
directory, and given the same access permissions as its sibling jar files. This will ensure that the library
is part of the classpath for Neo4j.
Neo4j Block Device configuration
Three parameters must be configured in neo4j.conf in order for the Neo4j Block Device Integration to
work:
• Set dbms.memory.pagecache.swapper=capi to enable the CAPI Flash block device integration.
• Set dbms.memory.pagecache.swapper.capi.lib to the path of the libcflsh_block.so library.
• Set dbms.memory.pagecache.swapper.capi.device to a device specifier that references all relevant
CAPI Flash ports and describes their topology. See The device specifier and CAPI Flash device
topology.
Once everything has been configured correctly, the CAPI Flash device needs to be formatted. This is
20
done with the neo4j-admin blockdev format command:
$neo4j-home> bin/neo4j-admin blockdev format
Neo4j stores its data in files, so the block device integration library comes with an embedded file
system called DBFS. Formatting the device writes the necessary metadata for DBFS to work. Note that
formatting the device will remove all data on the device. This cannot be undone.
After the device has been formatted, Neo4j can be started and will run with CAPI Flash as storage.
The device specifier and CAPI Flash device topology
The block device integration library exposes a virtual block device that can be composed of multiple
physical block devices. This is configured using the dbms.memory.pagecache.swapper.capi.device
setting in neo4j.conf.
The easiest device specifier configuration is one that is based on a single physical block device. In this
case, the device specifier is simply the path to that physical device. For example:
dbms.memory.pagecache.swapper.capi.device=/dev/sg1
If we have two devices exposed to us and they represent two different physical devices, then they will
have different LBA-spaces (Logical Block Addressing). This means that LBA 0 on one is a different block
than LBA 0 on the other. In this case we can bundle them up and use them as a single, larger device
that has a capacity that is the sum of the two devices. We do this by providing the paths to both
devices in the device specifier, separated by two path-separator characters. The path separator
character is semicolon ; on Windows, and colon : on all other platforms.
Below is an example where sg1 and sg2 have different LBA-spaces, and are combined into a single,
larger, logical device:
dbms.memory.pagecache.swapper.capi.device=/dev/sg1::/dev/sg2
The logical device consists of interleaving 16 MiB stripes from each of the underlying devices in the
order given by the device specifier. This effectively creates a software defined RAID-0 array of the
underlying devices.
By combining several devices (as many as required), a logical device can be created with a capacity
that is much larger than that provided by any individual device. Note that the participating underlying
devices must all have the same capacity. Below is an example where five devices are combined:
dbms.memory.pagecache.swapper.capi.device=/dev/sg6::/dev/sg7::/dev/sg8::/dev/sg9::/dev/sg10
Some devices can expose more than one port. They will look like multiple distinct devices to the
operating system, even though they have the same LBA-spaces. An example is the CAPI Flash cards
that connect to IBMs FlashSystem appliances through fiber channel. These cards have two FC (Fibre
Channel) ports that can both go to the same appliance and can be configured to expose the same
LBA-space. Assume that sg1 and sg2 represent these ports. In this case, it will not matter if LBA 0 is
accessed through sg1 or sg2; it will be the same physical block regardless of which port is used. We
describe such a setup in the device specifier by separating the paths to sg1 and sg2 with a single path
separator:
dbms.memory.pagecache.swapper.capi.device=/dev/sg1:/dev/sg2
21
These two features can be combined. Devices that share an LBA-space will be grouped together, each
separated by a single separator character, and each of the groups in turn separated by two separator
characters. For instance, below is an example where sg1 and sg2 share an LBA-space, while sg3 and
sg4 share a different LBA-space:
dbms.memory.pagecache.swapper.capi.device=/dev/sg1:/dev/sg2::/dev/sg3:/dev/sg4
┌─┘
┌┘
└┐
└─┐
┌────▼───┐┌────▼───┐ ┌────▼───┐┌────▼───┐
┌┤/dev/sg1├┤/dev/sg2├┐┌┤/dev/sg3├┤/dev/sg4├┐
│└────────┘└────────┘││└────────┘└────────┘│
│
device a
││
device b
│
└────────────────────┘└────────────────────┘
Again, note that all devices in such a setup must have exactly the same capacity.
2.6.2. Limitations
All features of Neo4j are not yet fully compatible with CAPI Flash as storage. The following is a list of
features that are not supported:
Store format upgrades
When upgrading Neo4j to a version requiring a so-called store migration to take place, first retrieve
the store files from the CAPI Flash storage, perform the upgrade on the normal file system, and then
import the upgraded store back onto CAPI Flash.
Dump and load facilites of the neo4j-admin command
The admin commands neo4j-admin dump and neo4j-admin load currently do not work with databases
that are stored on CAPI Flash. The block device integration library provides other commands that can
be used instead, but they are admittedly not quite as convenient at this time.
Changing the dbms.memory.pagecache.swapper parameters
Once the database has been started with a particular configuration, the
dbms.memory.pagecache.swapper parameter cannot be changed. If you do so anyway, the database will
log an error, and refuse to start.
Additionally, the nature of the block device integration technology itself has the following limitations:
The dbms.memory.pagecache.swapper.* configurations describe where the storage is located, and how it
is put together. They cannot be changed — not even the order of devices in the device specifier can be
modified — without formatting (neo4j-admin blockdev format) the devices afterwards. Changing the
configurations will either cause the loss of all the data, or will silently corrupt all the data. All of the
devices given in the device specifier must have exactly the same capacity. Specifically, they must all
have the same block size, and they must all have the same number of blocks. The easiest way to
ensure this is to use devices of the same make and model for all of them.
2.6.3. Admin commands for CAPI Flash
The Neo4j Block Device Integration Library adds the following commands to the neo4j-admin utility:
• neo4j-admin blockdev help prints a help message for all the block device specific admin
commands.
• neo4j-admin blockdev format formats the configured block device with DBFS. This removes all data
on the device.
22
• neo4j-admin blockdev ls gives a listing of the files stored on the configured block device.
• neo4j-admin blockdev fsck checks the consistency of the DBFS file system metadata.
• neo4j-admin blockdev import <from-database-path> imports the given existing database onto the
configured block device storage.
• neo4j-admin blockdev dump <file> dumps the binary contents of the given file on the block device
to standard out.
• neo4j-admin blockdev rename <source-path> <target-path> moves everything on the block device
from the given source path to the target path.
All of these commands require that the neo4j-blockdevice-*.jar file has been installed in the <neo4jinstall-dir>/lib directory, and that the neo4j.conf file has been properly configured to use CAPI
Flash.
If the block device integration has not been properly configured, then the following error message will
be shown:
$neo4j-home> bin/neo4j-admin blockdev help
neo4j-admin blockdev <sub-command> [options]
Configure, inspect and administrate the Neo4j block-device integration.
Use the 'help' sub-command for more information.
This database has not been configured to use custom block device storage
$neo4j-home>
Once the block device integration parameters have been configured in neo4j.conf, the help command
will be more useful:
$neo4j-home> bin/neo4j-admin blockdev help
neo4j-admin blockdev <sub-command> [options]
Configure, inspect and administrate the Neo4j block-device integration.
Use the 'help' sub-command for more information.
The following sub-commands are available:
* help
Print this help message.
* format
Format the block device with a file system, erasing all data on it.
* ls
List the files stored on the block device, and their size.
* fsck
Check the consistency of the file system metadata on the block device.
* import <from-database-path>
Import an existing Neo4j database from the given path on the local file
system, onto the block device at the same path.
* dump <file>
Dump the binary contents of the given file on the block device to standard
out.
* rename <source-path> <target-path>
Move everything on device from source path to target path. The 'move' is
effectively only a name change of the files. This is useful when performing
a database-wide file move operation.
$neo4j-home>
23
The format command
The format command formats the configured device with DBFS, the file system that is embedded with
the Neo4j block device integration library.
This command must be called after completing the configuration of dbms.memory.pagecache.swapper*
in neo4j.conf, but before starting Neo4j. The command effectively removes all data on the configured
device and prepares a clean file system for the database.
Below is an example showing the output:
$neo4j-home> bin/neo4j-admin blockdev format
Done! Device has been formatted with DBFS: /dev/sg1::/dev/sg2
$neo4j-home>
The ls command
The ls command lists the files stored on the configured block device. After a format, the device will be
empty:
$neo4j-home> bin/neo4j-admin blockdev ls
Total: 0 files, 0 bytes
$neo4j-home>
After the database has been started we can see that some files have been created:
$neo4j-home> bin/neo4j-admin blockdev ls
/neo4j-home/data/databases/graph.db/neostore.nodestore.db.labels
/neo4j-home/data/databases/graph.db/neostore.nodestore.db
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index.keys
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.strings
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.arrays
/neo4j-home/data/databases/graph.db/neostore.propertystore.db
/neo4j-home/data/databases/graph.db/neostore.relationshipstore.db
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db.names
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db.names
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db
/neo4j-home/data/databases/graph.db/neostore.schemastore.db
/neo4j-home/data/databases/graph.db/neostore.relationshipgroupstore.db
/neo4j-home/data/databases/graph.db/neostore
Total: 15 files, 102400 bytes
$neo4j-home>

4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
24576 bytes
16384 bytes
12288 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
Never use neo4j-admin commands on a running database unless they are explicitly
documented to support this.
The ls command will list the absolute paths of all the files on the device, without regard for your
current working directory. This is because it is operating on a file system that is unrelated to, and
disconnected from, your normal file system.
The fsck command
The fsck command checks the DBFS file system metadata to verify that it is consistent. If you
experience an apparent inconsistency with data in a CAPI Flash database installation, it is advisable to
run this command before doing a consistency check on the graph data itself. This fsck command is
very fast compared to neo4j-admin check-consistency, and the latter will not be meaningful if fsck
report failures.
24
A passing fsck looks like this:
$neo4j-home> bin/neo4j-admin blockdev fsck
DBFS file system is consistent!
$neo4j-home>
If fsck reports any errors, then it is unfortunately not user-repairable. Instead, send the fsck output,
along with neo4j.conf and debug.log to Neo Technology support.
The import command
The import command imports an existing database onto the block device storage while keeping its
location the same. This means that the imported store files will have the same file names and paths
on the block device as they did on the normal file system. This allows Neo4j to start up with the
database immediately after the import.
Say for instance you already have some data in the default graph.db database, and would like to store
it on block device storage. Then you can import it — after having configured Neo4j to use block device
storage — with the bin/neo4j-admin blockdev import data/databases/graph.db command:
$neo4j-home> bin/neo4j-admin blockdev import data/databases/graph.db
2016-11-24 17:53:21.994+0000 INFO [o.n.i.p.PageCache] Configured dbms.memory.pagecache.swapper: capi
Importing...
neostore: 1/1. Done!
neostore.propertystore.db.arrays: 2/2. Done!
neostore.propertystore.db.index.keys: 1/1. Done!
neostore.labeltokenstore.db: 0/0. Done!
neostore.propertystore.db.strings: 1/1. Done!
neostore.nodestore.db: 2/2. Done!
neostore.relationshiptypestore.db.names: 1/1. Done!
neostore.propertystore.db.index: 0/0. Done!
neostore.labeltokenstore.db.names: 1/1. Done!
neostore.nodestore.db.labels: 0/0. Done!
neostore.relationshiptypestore.db: 0/0. Done!
neostore.schemastore.db: 0/0. Done!
neostore.relationshipgroupstore.db: 10/10. Done!
neostore.propertystore.db: 57/57. Done!
neostore.relationshipstore.db: 6074/6074. Done!
100%
Done, PT0.628S.
$neo4j-home>
The import command reports its progress as a percentage, and in terms of blocks.
The dump command
The dump command takes an absolute path to a file on the block device storage, and writes its binary
contents to the standard output stream. This can be used as a rudimentary export feature. The typical
usage is to pipe the output into a file. Any error messages — such as the file name being
mistyped — will be written to the standard error stream, so they will not be hidden when the standard
output is sent into a pipe.
In the example below, the relationship store file is dumped, piped through gzip to be compressed,
and then written to a file in a backup directory:
$neo4j-home> bin/neo4j-admin blockdev dump /neo4jhome/data/databases/graph.db/neostore.relationshiptypestore.db | gzip >
/var/backup/neostore.relationshipstore.db.gz
$neo4j-home>
25
The rename command
The rename command can be used to rename or move files on the block device storage. It takes a
source-path parameter, which is used to match from the start of the absolute paths of files on the
block device, and a target-path parameter that will replace the source-path portion of all matching
paths. The matching is done on a path-element basis, so if we, for instance, want to rename the
/neo4j-home directory to /home, we have to spell out the whole neo4j-home path element, e.g. rename
/neo4j-home /home – just saying rename /neo4j- / will not work.
To illustrate, we can use rename to change the name of the database directory from the default
graph.db to, say, example.db. If we have the following files on the block device storage:
$neo4j-home> bin/neo4j-admin blockdev ls
/neo4j-home/data/databases/graph.db/neostore.nodestore.db.labels
/neo4j-home/data/databases/graph.db/neostore.nodestore.db
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index.keys
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.index
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.strings
/neo4j-home/data/databases/graph.db/neostore.propertystore.db.arrays
/neo4j-home/data/databases/graph.db/neostore.propertystore.db
/neo4j-home/data/databases/graph.db/neostore.relationshipstore.db
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db.names
/neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db.names
/neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db
/neo4j-home/data/databases/graph.db/neostore.schemastore.db
/neo4j-home/data/databases/graph.db/neostore.relationshipgroupstore.db
/neo4j-home/data/databases/graph.db/neostore
Total: 15 files, 102400 bytes
$neo4j-home>
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
24576 bytes
16384 bytes
12288 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
4096 bytes
Then our rename command can be given by:
$neo4j-home> bin/neo4j-admin blockdev rename /neo4j-home/data/databases/graph.db /neo4jhome/data/databases/example.db
rename from /neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db
rename to
/neo4j-home/data/databases/example.db/neostore.relationshiptypestore.db
rename from /neo4j-home/data/databases/graph.db/neostore.nodestore.db.labels
rename to
/neo4j-home/data/databases/example.db/neostore.nodestore.db.labels
rename from /neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db.names
rename to
/neo4j-home/data/databases/example.db/neostore.labeltokenstore.db.names
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.arrays
rename to
/neo4j-home/data/databases/example.db/neostore.propertystore.db.arrays
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.strings
rename to
/neo4j-home/data/databases/example.db/neostore.propertystore.db.strings
rename from /neo4j-home/data/databases/graph.db/neostore.relationshiptypestore.db.names
rename to
/neo4j-home/data/databases/example.db/neostore.relationshiptypestore.db.names
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.index
rename to
/neo4j-home/data/databases/example.db/neostore.propertystore.db.index
rename from /neo4j-home/data/databases/graph.db/neostore.labeltokenstore.db
rename to
/neo4j-home/data/databases/example.db/neostore.labeltokenstore.db
rename from /neo4j-home/data/databases/graph.db/neostore.schemastore.db
rename to
/neo4j-home/data/databases/example.db/neostore.schemastore.db
rename from /neo4j-home/data/databases/graph.db/neostore.nodestore.db
rename to
/neo4j-home/data/databases/example.db/neostore.nodestore.db
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db.index.keys
rename to
/neo4j-home/data/databases/example.db/neostore.propertystore.db.index.keys
rename from /neo4j-home/data/databases/graph.db/neostore
rename to
/neo4j-home/data/databases/example.db/neostore
rename from /neo4j-home/data/databases/graph.db/neostore.relationshipstore.db
rename to
/neo4j-home/data/databases/example.db/neostore.relationshipstore.db
rename from /neo4j-home/data/databases/graph.db/neostore.propertystore.db
rename to
/neo4j-home/data/databases/example.db/neostore.propertystore.db
rename from /neo4j-home/data/databases/graph.db/neostore.relationshipgroupstore.db
rename to
/neo4j-home/data/databases/example.db/neostore.relationshipgroupstore.db
$neo4j-home>
The rename command can also be used on individual files, by providing the complete absolute path for
the given file.
26
Chapter 3. Configuration
This chapter describes configuration of Neo4j components.
The topics described are:
• File locations — An overview of where files are stored in the different Neo4j distributions and the
necessary file permissions for running Neo4j.
• Ports — An overview of the ports relevant to a Neo4j installation.
• Set initial password — How to set an initial password.
• Wait for Neo4j to start — How to poll for Neo4j started status.
• Usage Data Collector — Information about the Usage Data Collector.
• Configure Neo4j connectors — How to configure Neo4j connectors.
• Install certificates — How to install certificates.
3.1. File locations
This section provides an overview of where files are stored in the different Neo4j distributions
and the necessary file permissions for running Neo4j.
Important files can be found in the following locations by default.
Package
Configurati Data
on
Logs
Metrics
Import
Bin
Lib
Plugins
Linux or OS <neo4j<neo4j<neo4jX tarball
home>/conf home>/data home>/logs
/neo4j.conf
<neo4j<neo4jhome>/metr home>/imp
ics
ort
<neo4jhome>/bin
<neo4jhome>/lib
<neo4jhome>/plugi
ns
Windows
zip
<neo4j<neo4jhome>\metr home>\imp
ics
ort
<neo4jhome>\bin
<neo4jhome>\lib
<neo4jhome>\plugi
ns
<neo4j<neo4j<neo4jhome>\conf home>\data home>\logs
\neo4j.conf
Debian/Ubu /etc/neo4j/n /var/lib/neo
ntu .deb
eo4j.conf
4j/data
/var/log/neo /var/lib/neo
4j
4j/metrics
/var/lib/neo
4j/import
Windows
desktop
%APPDATA%
\Neo4j
Community
Edition\neo4
j.conf
%APPDATA%
\Neo4j
Community
Edition\logs
%APPDATA% %ProgramFil (in package)
\Neo4j
es%\Neo4j
Community CE 3.1\bin
Edition\imp
ort
OS X
desktop
${HOME}/Do ${HOME}/Do ${HOME}/Do ${HOME}/Do ${HOME}/Do (in package) (in package) (in package)
cuments/Ne cuments/Ne cuments/Ne cuments/Ne cuments/Ne
o4j/neo4j.co o4j
o4j/logs
o4j/metrics o4j/import
nf
%APPDATA%
\Neo4j
Community
Edition
%APPDATA%
\Neo4j
Community
Edition\metr
ics
/usr/share/n /usr/share/n /var/lib/neo
eo4j/bin
eo4j/lib
4j/plugins
%ProgramFil
es%\Neo4j
CE
3.1\plugins
Please note that the data directory is internal to Neo4j and its structure subject to change between
versions without notice.
3.1.1. Log files
Filename
Description
neo4j.log
The standard log, where general information about Neo4j
is written.
debug.log
Information useful when debugging problems with Neo4j.
27
Filename
Description
http.log
Request log for the HTTP API.
gc.log
Garbage Collection logging provided by the JVM.
query.log
Log of executed queries that takes longer than a specified
threshold. (Enterprise Edition only.)
security.log
Log of security events. (Enterprise Edition only.)
service-error.log
Log of errors encountered when installing or running the
Windows service. (Windows only.)
3.1.2. Configuration
Some of these paths are configurable with dbms.directories.* settings; see Configuration settings for
details.
The locations of <neo4j-home>, bin and conf can be configured using environment variables.
Location
Default
Environment variable
Notes
<neo4j-home>
parent of bin
NEO4J_HOME
Must be set explicitly if bin is
not a subdirectory.
bin
directory where neo4j script
is located
NEO4J_BIN
Must be set explicitly if neo4j
script is invoked as a
symlink.
conf
<neo4j-home>/conf
NEO4J_CONF
Must be set explicitly if it is
not a subdirectory of <neo4jhome>.
3.1.3. Permissions
The user that Neo4j runs as must have the following permissions:
Read only
• conf
• import
• bin
• lib
• plugins
Read and write
• data
• logs
• metrics
Execute
• all files in bin
3.2. Ports
This section lists ports relevant to a Neo4j installation.
This section provides an overview for determining which ports should be opened up in your firewalls.
28
Specific recommendations on port openings cannot be made, as the firewall configuration must be
performed taking your particular conditions into consideration.
Name
Default port number
Related settings
Comments
Backups
6362
Backups are disabled by
dbms.backup.enabled=true
dbms.backup.address=0.0.0. default. In production
environments, external
0:6362
access to this port should be
blocked by a firewall. See
also Backup.
HTTP
7474
See Configure Neo4j
connectors.
It is recommended to not
open up this port for
external access in
production environments,
since traffic is unencrypted.
Used by the Neo4j Browser.
Also used by REST API.
HTTPS
7473
See Configure Neo4j
connectors.
Also used by REST API.
Bolt
7687
See Configure Neo4j
connectors.
Used by Cypher Shell and by
the Neo4j Browser.
Causal Cluster
5000, 6000, 7000
causal_clustering.discover
y_listen_address=:5000
causal_clustering.transact
ion_listen_address=:6000
causal_clustering.raft_lis
ten_address=:7000
The listed ports are the
default ports in neo4j.conf.
The ports are likely be
different in a production
installation; therefore the
potential opening of ports
must be modified
accordingly. See also Causal
Cluster settings reference.
HA Cluster
5001, 6001
ha.host.coordination=127.0
.0.1:5001
ha.host.data=127.0.0.1:600
1
The listed ports are the
default ports in neo4j.conf.
The ports will most likely be
different in a production
installation; therefore the
potential opening of ports
must be modified
accordingly. See also Highly
Available cluster.
Graphite monitoring
2003
metrics.graphite.server=lo This is an outbound
connection in order for the
calhost:2003
Neo4j database to
communicate with the
Graphite server. See also
Metrics.
JMX monitoring
3637
This setting is for exposing
dbms.jvm.additional=Dcom.sun.management.jmxrem the JMX. We are not
promoting this way of
ote.port=3637
inspecting the database. It is
not enabled by default.
Neo4j-shell
1337
dbms.shell.port=1337
The neo4j-shell tool is being
deprecated and it is
recommended to
discontinue its use.
Supported tools that replace
the functionality neo4j-shell
are described under Tools.
3.3. Set an initial password
Use the set-initial-password command of neo4j-admin to define the password for the native user
neo4j. This must be performed before starting up the database for the first time.
Syntax:
29
neo4j-admin set-initial-password <password>
Example 1. Use the set-initial-password command of neo4j-admin
Set the password for the native neo4j user to 'h6u4%kr' before starting the database for the first
time.
$neo4j-home> bin/neo4j-admin set-initial-password h6u4%kr
If the password is not set explicitly using this method, it will be set to the default password neo4j. In
that case, you will be prompted to change the default password at first login.
3.4. Wait for Neo4j to start
After starting Neo4j it may take some time before the database is ready to serve requests. Systems
that depend on the database should be able to retry if it is unavailable in order to cope with network
glitches and other brief outages. To specifically wait for Neo4j to be available after starting, poll the
Bolt or HTTP endpoint until it gives a successful response.
The details of how to poll depend:
• Whether the client uses HTTP or Bolt.
• Whether encryption or authentication are enabled.
It is important to include a timeout in case Neo4j fails to start. Normally ten seconds should be
sufficient, but database recovery or upgrade may take much longer depending on the size of the
store. If the instance is part of a cluster then the endpoint will not be available until other instances
have started up and the cluster has formed.
Here is an example of polling written in Bash using the HTTP endpoint, with encryption and
authentication disabled.
end="$((SECONDS+10))"
while true; do
[[ "200" = "$(curl --silent --write-out %{http_code} --output /dev/null http://localhost:7474)" ]] &&
break
[[ "${SECONDS}" -ge "${end}" ]] && exit 1
sleep 1
done
3.5. Usage Data Collector
The Neo4j Usage Data Collector is a sub-system that gathers usage data, reporting it to the UDCserver at udc.neo4j.org. It is easy to disable, and does not collect any data that is confidential. For
more information about what is being sent, see below.
The Neo4j team uses this information as a form of automatic, effortless feedback from the Neo4j
community. We want to verify that we are doing the right thing by matching download statistics with
usage statistics. After each release, we can see if there is a larger retention span of the server
software.
The data collected is clearly stated here. If any future versions of this system collect additional data,
we will clearly announce those changes.
The Neo4j team is very concerned about your privacy. We do not disclose any personally identifiable
information.
30
3.5.1. Technical Information
To gather good statistics about Neo4j usage, UDC collects this information:
• Kernel version: The build number, and if there are any modifications to the kernel.
• Store id: A randomized globally unique id created at the same time a database is created.
• Ping count: UDC holds an internal counter which is incremented for every ping, and reset for every
restart of the kernel.
• Source: This is either "neo4j" or "maven". If you downloaded Neo4j from the Neo4j website, it’s
"neo4j", if you are using Maven to get Neo4j, it will be "maven".
• Java version: The referrer string shows which version of Java is being used.
• Registration id: For registered server instances.
• Tags about the execution context (e.g. test, language, web-container, app-container, spring, ejb).
• Neo4j Edition (community, enterprise).
• A hash of the current cluster name (if any).
• Distribution information for Linux (rpm, dpkg, unknown).
• User-Agent header for tracking usage of REST client drivers
• MAC address to uniquely identify instances behind firewalls.
• The number of processors on the server.
• The amount of memory on the server.
• The JVM heap size.
• The number of nodes, relationships, labels and properties in the database.
After startup, UDC waits for ten minutes before sending the first ping. It does this for two reasons;
first, we don’t want the startup to be slower because of UDC, and secondly, we want to keep pings
from automatic tests to a minimum. The ping to the UDC servers is done with a HTTP GET.
3.5.2. How to disable UDC
UDC is easily turned off by disabling it in the database configuration, in neo4j.conf for Neo4j server or
in the configuration passed to the database in embedded mode. See UDC Configuration in the
configuration section for details.
3.6. Configure Neo4j connectors
Neo4j supports clients using either the Bolt binary protocol or HTTP/HTTPS. Three different Neo4j
connectors are configured by default: a bolt connector, a http connector, and a https connector.
Table 3. Default connectors and their ports
Connector name
Protocol
Default port number
dbms.connector.bolt
Bolt
7687
dbms.connector.http
HTTP
7474
dbms.connector.https
HTTPS
7473
When configuring the HTTPS connector see also Install certificates for details on how to work with SSL
certificates."
31
3.6.1. Additional options for Neo4j connectors
Three options are available for Neo4j connectors:
• enabled
• listen_address
• advertised_address
The options are summarized in the table below and subsequently explained in more detail.
Table 4. Configuration options for connectors
Option name
Default
Description
enabled
true
Allows the client connector to be
enabled or disabled.
listen_address
localhost:<connector-default-port>
The address for incoming connections.
advertised_address
localhost:<connector-default-port>
The address that clients should use for
this connector.
enabled
The enabled setting allows the client connector to be enabled or disabled. When disabled, Neo4j does
not listen for incoming connections on the relevant port. For example, to disable the HTTPS
connector:
dbms.connector.https.enabled=false
Note: It is not possible to disable the HTTP connector. To prevent clients from connecting to HTTP,
block the HTTP port with the firewall, or configure listen_address for the http connector to only listen
on the loopback interface (localhost) thereby preventing connections from remote clients.
listen_address
The listen_address setting specifies how Neo4j listens for incoming connections. It consists of two
parts: a network interface specification (e.g. localhost or 0.0.0.0) and a port number (e.g. 7687), and is
expressed in the format <network-interface>:<port-number>.
Example 2. Specify listen_address for the Bolt connector
To listen for Bolt connections on all network interfaces (0.0.0.0) and on port 7000, set the
listen_address for the bolt connector:
dbms.connector.bolt.listen_address=0.0.0.0:7000
advertised_address
The advertised_address setting specifies the address clients should use for this connector. This is
useful in a causal cluster as it allows each server to correctly advertise addresses of the other servers
in the cluster. The advertised address consists of two parts: a network interface specification
(hostname or IP address) and a port number (e.g. 7687), and is expressed in the format <networkinterface>:<port-number>.
If routing traffic via a proxy, or if port mappings are in use, it is possible to specify advertised_address
for each connector individually. For example, if port 7687 on the Neo4j Server is mapped from port
9000 on the external network, specify the advertised_address for the bolt connector:
32
dbms.connector.bolt.advertised_address=<server-name>:9000
3.6.2. Defaults for network interfaces
The two configuration settings dbms.connectors.default_listen_address and
dbms.connectors.default_advertised_address can be used to specify the network interface part of
listen_address and advertised_address, respectively. Setting a default value will apply to all the
connectors, unless specifically configured for a certain connector.
Table 5. Defaults for network interfaces
Option name
Default
Description
dbms.connectors.default_listen_addre localhost
ss
The default network interface
specification for listen_address for all
connectors.
dbms.connectors.default_advertised_a localhost
ddress
The default network interface
specification for advertised_address
for all connectors.
default_listen_address
The listen address consists of two parts: a network interface specification (e.g. localhost or 0.0.0.0)
and a port number (e.g. 7687). If the network interface part of the listen_address is not specified, the
interface is inherited from the shared setting default_listen_address.
Example 3. Specify listen_address for the Bolt connector
To listen for Bolt connections on all network interfaces (0.0.0.0) and on port 7000, set the
listen_address for the bolt connector:
dbms.connector.bolt.listen_address=0.0.0.0:7000
This is equivalent to specifying the network interface using the default_listen_address setting
and then just specifying the port number for the bolt connector.
dbms.connectors.default_listen_address=0.0.0.0
dbms.connector.bolt.listen_address=:7000
default_advertised_address
The advertised address consists of two parts: a network interface specification (hostname or IP
address) and a port number (e.g. 7687). If the network interface part of the advertised_address is not
specified, the interface is inherited from the shared setting default_advertised_address.
33
Example 4. Specify advertised_address for the Bolt connector
Specify the address clients should use for the Bolt connector:
dbms.connector.bolt.advertised_address=server1:9000
This is equivalent to specifying the network interface using the default_advertised_address
setting and then just specifying the port number for the bolt connector.
dbms.connectors.default_advertised_address=server1
dbms.connector.bolt.advertised_address=:9000
3.7. Install certificates
Neo4j, when used with the official drivers, encrypts all client-server communication with TLS by
default. This applies to both Bolt and HTTP communications.
3.7.1. Certificates issued by a Certificate Authority
SSL Certificates must be issued by a trusted Certificate Authority. A certificate consists of a <filename>.key file and a <file-name>.cert file. In order to use your certificate with Neo4j, the files must be
named neo4j.key and neo4j.cert, respectively. Note that the key should be unencrypted. Ensure correct
permissions are set on the private key, such that only the Neo4j user can read it.
Place the files into the assigned directory. The location of this can be configured by setting
dbms.directories.certificates in neo4j.conf. If not explicitly configured, the default is a directory named
certificates, which is located in the neo4j-home directory.
Neo4j supports chained SSL certificates. All certificates need to be in the PEM format, and they must
be combined into one file. The private key is also required to be in the PEM format. Multi-host and
wildcard certificates are supported. Such certificates are required if Neo4j has been configured with
multiple connectors that bind to different interfaces.
3.7.2. Auto-generated certificates
If started without any certificates installed, the Neo4j process will automatically generate a self-signed
SSL certificate and a private key. Because this certificate is self signed, it is not safe to rely on it for
production use. Using auto-generation of self-signed SSL certificates will not work if Neo4j has been
configured with multiple connectors that bind to different IP addresses. If you need to use multiple IP
addresses, please configure certificates manually and use multi-host or wildcard certificates instead.
34
Chapter 4. Clustering
This chapter describes the Neo4j clustering solutions Causal Clustering and Highly Available
Cluster.

The clustering features are available in Neo4j Enterprise Edition.
This chapter describes Neo4j’s architecture with regards to the clustering features.
Neo4j offers two separate solutions for ensuring redundancy and performance in a high-demand
production environment:
• Causal Clustering
• Highly Available Cluster
Clustering for the enterprise
Enterprise IT requirements are demanding. Our solutions are expected to provide high throughput,
continuous availability, and reliability. Further, in most IT ecosystems we often want to run long-lived
queries on operational data for analytics and reporting purposes. When designing our solutions, we
must ensure that any technology choices we make can underpin those critical enterprise
requirements.
High throughput
To meet demanding graph workloads, Neo4j clusters allow work to be federated across a number of
cooperating machines.
Figure 1. Throughput
In a clustered environment, throughput goals (graph queries) can be met by allowing each machine to
process a subset of the overall queries. This scheme also reduces latency as we can see by the (logical)
queue length in the diagram above.
Continuous availability
35
A fundamental requirement for most enterprise-grade systems is high availability. That is, even in the
presence of failures, the system continues to deliver its functionality to end users (humans or other
computer systems).
Figure 2. Availability
Neo4j’s clustering architecture is an automated solution for ensuring that Neo4j is continuously
available. The premise is that we deploy redundancy into the cluster such that if failures occur they
can be masked by the remaining live instances. In the case above a single failed instance does not
cause the cluster to stop (though the throughput of the cluster may be lower).
Disaster recovery
Disaster recovery is the ability to recover from major service outages, greater than can be
accommodated by the redundant capacity in a continuously available cluster. Typically these are
manifested as data center outages, physical network severance, or even denial of service attacks that
render large amounts of infrastructure inoperable.
36
Figure 3. Safety
In these cases a disaster recovery strategy can define a failover datacenter along with a strategy for
bringing services back online. Neo4j clustering can accommodate disaster recovery strategies that
require very short-windows of downtime or low tolerances for data loss in disaster scenarios. By
deploying a cluster instance to an alternate location, you have an active copy of your database up and
available in your designated disaster recovery location that is up to date with the transactions
executed against your operational database cluster.
In anticipating a disaster recovery instance or instances we are helping to minimize downtime and
ensure the safety of data. Given disaster recovery happens by definition at stressful and inconvenient
times, having a well designed recovery scenario as part of the database cluster is a sensible plan,
albeit one we hope to never action.
Analytics and reporting
Operational data is the lifeblood of our online processing systems. However other stakeholders in the
enterprise require access to that data for their own business intelligence purposes. Analytics and
reporting queries are often ad-hoc from the database’s perspective. Queries may be speculative or
wide-ranging as new analyses are performed. This means the workload can be unpredictable and
onerous. Such workloads risk upsetting the balance of work in the system leaving fewer resources
available for the online workloads (e.g. customers). We must be amenable to servicing the needs of
the analytics requests too. Fortunately Neo4j clustering can be used to provide separate instances
entirely in support of query analytics, either from end users or from BI tools. As a consequence of
being part of the cluster, the analytics instances are up to date and do not require any external ETL
jobs or other complexity.
4.1. Causal Cluster
This chapter gives a comprehensive description of Causal Clustering, including the theoretical
background and architecture as well as configuration details and instructions.
37
This chapter gives a comprehensive description of Causal Clustering. It starts with the theoretical
background and a discussion about architecture. It then proceeds to explicit configuration details and
gives instructions on how to configure and operate the Causal Cluster.
The topics described are:
• Architecture — An overview of the Causal Clustering architecture.
• Lifecycle — A walk-through of the life cycle of a cluster.
• Create a new cluster — How to configure the Core instances and test that the cluster is
operational. How to add Read replicas to the cluster.
• Seed a cluster — How to seed a cluster with an existing data store.
• Important configuration settings — A summary of the most important Causal Cluster settings.
For setting up a test cluster locally on a single machine see Set up a local Causal Cluster.
4.1.1. Introduction
Neo4j’s Causal Clustering provides two main features:
1. Safety: Core servers provide a fault tolerant platform for transaction processing which will remain
available while a simple majority of those Core servers are functioning.
2. Scale: Read replicas provide a massively scalable platform for graph queries that enables very
large graph workloads to be executed in a widely distributed topology.
Together, this allows the end-user system to be fully functional and both read and write to the
database in the event of multiple hardware and network failures.
In the remainder of this section we will provide an overview of how causal clustering works in
production, including both operational and application aspects.
Operational view
From an operational point of view, it is useful to view the cluster as being composed from its two
different roles: Core and Read replica.
Figure 4. Causal Cluster Architecture
The two roles are foundational in any production deployment but are managed at different scales
38
from one another and undertake different roles in managing the fault tolerance and scalability of the
overall cluster.
Core servers
Core servers' main responsibility is to safeguard data. The Core servers do so by replicating all
transactions using the Raft protocol. Raft ensures that the data is safely durable before confirming
transaction commit to the end user application. In practice this means once a majority of Core servers
in a cluster (N/2+1) have accepted the transaction, it is safe to acknowledge the commit to the end user
application.
The safety requirement has an impact on write latency. Implicitly writes will be acknowledged by the
fastest majority, but as the number of Core servers in the cluster grows so do the size of the majority
needed to acknowledge a write.
In practice this means that there are relatively few machines in a typical Core server cluster, enough to
provide sufficient fault tolerance for the specific deployment. This is simply calculated with the
formula M = 2F + 1 where M is the number of Core servers required to tolerate F faults. For example,
in order tolerate 2 failed Core servers we would need to deploy a cluster of 5.
Note that should the Core server cluster suffer enough failures that it can no longer process writes, it
will become read-only to preserve safety.
Read replicas
Read replicas' main responsibility is to scale out graph workloads (Cypher queries, procedures, and so
on). Read replicas act like caches for the data that the Core servers safeguard, but they are not simple
key-value caches. In fact Read replicas are fully-fledged Neo4j databases capable of fulfilling arbitrary
(read-only) graph queries and procedures.
Read replicas are asynchronously replicated from Core servers via transaction log shipping.
Periodically (usually in the ms range) a Read replica will poll a Core server for any new transactions
that it has processed since the last poll, and the Core server will ship those transactions to the Read
replica. Many Read replicas can be fed data from a relatively small number of Core servers, allowing
for a large fan out of the query workload for scale.
Unlike Core servers however, Read replicas do not participate in decision making about cluster
topology. Read replicas should be typically run in relatively large numbers and treated as disposable.
Losing a Read replica does not impact the cluster’s availability, aside from the loss of its fraction of
graph query throughput. It does not affect the fault tolerance capabilities of the cluster.
Application view
While the operational mechanics of the cluster are interesting from an application point of view, it is
more constructive to think about how to actually get things done. In an application we typically want
to read from the graph and write to the graph. Depending on the nature of the workload we usually
want reads from the graph to take into account previous writes to ensure causal consistency.

Causal consistency is one of the consistency models used in distributed computing.
It ensures that operations causally related are are seen by every node of the system
in the same order. Consequently client applications of the cluster attain read-yourown-writes semantics. Read-your-own-writes semantics ensures that clients never
see stale data and enjoy an interaction mode that is as simple as a single database
server with the fault tolerance and scale of a (large) cluster.
Causal consistency makes it easy to write to Core (where data is safe) and read those writes from a
Read replica (where graph operations are scaled out).
39
Figure 5. Causal Cluster setup with causal consistency via Neo4j drivers
On executing a transaction, the client can ask for a bookmark which it then presents as a parameter to
the next transaction. Using that bookmark the cluster can ensure that only servers which have
processed the client’s bookmarked transaction will be able to run its next transaction. This provides a
causal chain which ensures correct behaviour from the client’s point of view.
Apart from the bookmark everything else is handled by the cluster. In particular, the database drivers
work with the cluster topology manager to choose the most appropriate Core servers and Read
replicas to provide high quality of service.
Summary
In this section we have taken a high-level look at Causal Clustering from both an operational and an
application development point of view. We now understand that the Core servers in the cluster are
responsible for the long-term safekeeping of data while the more numerous Read replicas are
responsible for scaling out graph query workloads. Reasoning about this powerful architecture is
greatly simplified by the Neo4j drivers which abstract the cluster topology to easily provide read levels
like causal consistency.
4.1.2. Causal Cluster lifecycle
This section describes the lifecycle of a Causal Cluster, from discovery, joining the cluster,
Core and Read replica membership and protocols for polling, catchup and backup, to
leaving the cluster on shutdown.
Introduction provided an overview of a Causal Cluster. In this section we will develop some deeper
knowledge of how the cluster operates. By developing our understanding of how the cluster works we
will be better equipped to design, deploy, and troubleshoot our production systems.
Our in-depth tour will follow the lifecycle of a cluster. We will boot a Core cluster and pick up key
architectural foundations as the cluster forms and transacts. We will then add in Read replicas and
40
show how they bootstrap join the cluster and then catchup and remain caught up with the Core
servers. We will then see how backup is used in live cluster environments before shutting down Read
replicas and Core servers.
Discovery protocol
The discovery protocol is the first step in forming a Causal Cluster. It takes in some hints about
existing Core cluster servers and using these hints to initiate a network join protocol.
Figure 6. Causal Cluster discovery protocol: Core-to-Core or Read replica-to-Core only.
From these hints the server will either join an existing cluster or form one of its own (don’t worry
about forming split brained clusters, Core cluster formation is safe since it is underpinned by the Raft
protocol).

The discovery protocol targets Core servers only regardless of whether it is a Core
server or Read replica performing discovery. It is because we expect Read replicas
to be both numerous and, relatively speaking, transient whereas Core servers will
likely be fewer in number and relatively stable over time.
The hints are delivered as initial_discovery_members in the neo4j.conf configuration file, typically as
dotted-decimal IP addresses and advertised ports. On consuming the hints the server will try to
handshake with the other listed servers. On successful handshake with another server or servers the
current server will discover the whole current topology.
The discovery service continues to run throughout the lifetime of the Causal Cluster and is used to
maintain the current state of available servers and to help clients route queries to an appropriate
server via the client-side drivers (http://neo4j.com/docs/developer-manual/3.1/drivers/).
Core membership
If it is a Core server that is performing discovery, once it has made a connection to the one of the
existing Core servers it then joins the Raft protocol.
41

Raft is a distributed algorithm for maintaining a consistent log across multiple
shared-nothing servers designed by Diego Ongaro for his 2014 Ph.D. thesis. See the
Raft thesis (https://ramcloud.stanford.edu/~ongaro/thesis.pdf) for details.
Raft handles cluster membership by making it a normal part of keeping a distributed log in sync.
Joining a cluster involves the insertion of a cluster membership entry into the Raft log which is then
reliably replicated around the existing cluster. Once that entry is applied to enough members of the
Raft consensus group (those machines running the specific instance of the algorithm), they update
their view of the cluster to include the new server. Thus membership changes benefit from the same
safety properties as other data transacted via Raft (see Transacting via the Raft protocol for more
information).
The new Core server must also catch up its own Raft log with respect to the other Core servers as it
initializes its internal Raft instance. This is the normal case when a cluster is first booted and has
performed few operations. There will be a delay before the new Core server becomes available if it
also needs to catch up (as per Catchup protocol) graph data from other servers. This is the normal
case for a long lived cluster where the servers holds a great deal of graph data.

When an instance establishes a connection to any other instance, it determines the
current state of the cluster and ensures that it is eligible to join. To be eligible the
Neo4j instance must host the same database store as other members of the cluster
(although it is allowed to be in an older, outdated, state), or be a new deployment
without a database store.
Read replica membership
When a Read replica performs discovery, once it has made a connection to any of the available Core
clusters it proceeds to add itself into a shared whiteboard.
Figure 7. All Read replicas registered with shared whiteboard.
This whiteboard provides a view of all live Read replicas and is used both for routing requests from
database drivers that support end-user applications and for monitoring the state of the cluster.

The Read replicas are not involved in the Raft protocol, nor are they able to
influence cluster topology. Hence a shared whiteboard outside of Raft comfortably
scales to very large numbers of Read replicas.
42
The whiteboard is kept up to date as Read replicas join and leave the cluster, even if they fail abruptly
rather than leaving gracefully.
Transacting via the Raft protocol
Once bootstrapped, each Core server spends its time processing database transactions. Updates are
reliably replicated around Core servers via the Raft protocol. Updates appear in the form of a
(committed) Raft log entry containing transaction commands which is subsequently applied to the
graph model.

One of Raft’s primary design goals is to be easily understandable so that there are
fewer places for tricky bugs to hide in implementations. As a side-effect, it is also
easy for database operators to reason about their Core servers in their Causal
Clusters.
The Raft Leader for the current term (a logical clock) appends the transaction (an 'entry' in Raft
terminology) to the head of its local log and asks the other instances to do the same. When the Leader
can see that a majority instances have appended the entry, it can be considered committed into the
Raft log. The client application can now be informed that the transaction has safely committed since
there is sufficient redundancy in the system to tolerate any (non-pathological) faults.

The Raft protocol describes three roles that an instance can be playing: Leader,
Follower, and Candidate. These are transient roles and any Core server can expect to
play them throughout the lifetime of a cluster. While it is interesting from a
computing science point of view to understand those states, operators should not
be overly concerned: they are an implementation detail.
For safety, within any Raft protocol instance there is only one Leader able to make forward progress in
any given term. The Leader bears the responsibility for imposing order on Raft log entries and driving
the log forward with respect to the Followers.
Followers maintain their logs with respect to the current Leader’s log. Should any participant in the
cluster suspect that the Leader has failed, then they can instigate a leadership election by entering the
Candidate state. In Neo4j Core servers this is happens at ms timescale, around 500ms by default.
Whichever instance is in the best state (including the existing Leader, if it remains available) can
emerge from the election as Leader. The "best state" for a Leader is decided by highest term, then by
longest log, then by highest committed entry.
The ability to fail over roles without losing data allows forward progress even in the event of faults.
Even where Raft instances fail, the protocol can rapidly piece together which of the remaining
instances is best placed to take over from the failed instance (or instances) without data loss. This is
the essence of a non-blocking consensus protocol which allows Neo4j Causal Clustering to provide
continuous availability to applications.
Catchup protocol
Read replicas spend their time concurrently processing graph queries and applying a stream of
transactions from the Core servers to update their local graph store.
43
Figure 8. Transactions shipped from Core to Read replica.
Updates from Core servers to Read replicas are propagated by transaction shipping. Transaction
shipping is instigated by Read replicas frequently polling any of the Core servers specifying the ID of
the last transaction they received and processed. The frequency of polling is an operational choice.

Neo4j transaction IDs are strictly monotonic integer values (they always increase).
This makes it simple to determine whether or not a transaction has been applied to
a Read Replica by comparing its last processed transaction ID with that of a Core
server.
If there is a large difference between an Read replica’s transaction history and that of a Core server,
polling may not result in any transactions being shipped. This is quite expected, for example when a
new Read replica is introduced to a long-running cluster or where a Read replica has been down for
some significant period of time. In such cases the catchup protocol will realise the gap between the
Core servers and Read replica is too large to fill via transaction shipping and will fall back to copying
the database store directly from Core server to Read replica. Since we are working with a live system,
at the end of the database store copy the Core server’s database is likely to have changed. The Read
replica completes the catchup by asking for any transactions missed during the copy operation before
becoming available.

A very slow database store copy could conceivably leave the Read replica too far
behind to catch up via transaction log shipping as the Core server has substantially
moved on. In such cases the Read replica server repeats the catchup protocol. In
pathological cases the operator can intervene to snapshot, restore, or file copy
recent store files from a fast backup.
Backup protocol
During the lifetime of the Causal Cluster, operators will want to back up the cluster state for disaster
recovery purposes. Backup is a strategy that places a deliberate gap between the online system and
its recent state such that the two do not share common failure points (such as the same cloud
storage). Backup is in addition to and orthogonal to any strategies for spreading Core servers and
Read replicas across data centers.

For operational details on how to backup a Neo4j cluster, see Backup a Causal
Cluster.
The Backup protocol is actually implemented as an instance of the Catchup protocol. Instead of the
client being a Read replica, it is in fact the neo4j-backup tool that spools the data out to disk rather
than to a live database.
44
Both full and incremental backups can be taken via neo4j-backup and both Core servers and Read
replicas can fulfil support backups. However, given the relative abundance of Read replicas it is typical
for backups to target one of them rather than the less plentiful Core servers (see Backup a Causal
Cluster for more on Core versus Read replica backups).
Read replica shutdown
On clean shutdown, a Read replica will invoke the discovery protocol to remove itself from the shared
whiteboard overview of the cluster. It will also ensure that the database is cleanly shutdown and
consistent, immediately ready for future use.
On an unclean shutdown such as a power outage, the Core servers maintaining the overview of the
cluster will notice that the Read replica’s connection has been abruptly been cut. The discovery
machinery will initially hide the Read replica’s whiteboard entry, and if the Read replica does not
reappear quickly its modest memory use in the shared whiteboard will be reclaimed.
On unclean shutdown it is possible the Read replica will not have entirely consistent store files or
transaction logs. On subsequent reboot the Read replica will rollback any partially applied
transactions such that the database is in a consistent state.
Core shutdown
A clean Core server shutdown, like Core server booting, is handled via the Raft protocol. When a Core
server is shut down, it appends a membership entry to the Raft log which is then replicated around
the Core servers. Once a majority of Core servers have committed that membership entry the leaver
has logically left the cluster and can safely shut down. All remaining instances accept that the cluster
has grown smaller, and is therefore less fault tolerant. If the leaver happened to be playing the Leader
role at the point of leaving, it will be transitioned to another Core server after a brief election.
An unclean shutdown does not directly inform the cluster that a Core server has left. Instead the Core
cluster size remains the same for purposes of computing majorities for commits. Thus an unclean
shutdown in a cluster of 5 Core servers now requires 3/4 members to agree to commit which is a
tighter margin than 3/5 before the unclean shutdown.

Of course when Core servers fail, operators or monitoring scripts can be alerted so
that they can intervene in the cluster if necessary.
If the leaver was playing the Leader role, there will be a brief election to produce a new Leader. Once
the new Leader is established, the Core cluster continues albeit with less redundancy. However even
with this failure, a Core cluster of 5 servers reduced to 4 can still tolerate one more fault before
becoming read-only.
4.1.3. Create a new Causal Cluster
This section describes how to deploy a new Neo4j Causal Cluster.
In this section we will learn how to deploy a brand new Neo4j Causal Cluster. For a description of the
clustering architecture and cluster concepts we will encounter here please refer to Introduction. In
creating the new Causal Cluster we will learn how to adapt the settings in each of the Neo4j servers'
configuration files. Ultimately we will learn how to set up a cluster of three Core instances, the
minimum number of servers needed for the Core cluster to safely form, and three Read replicas to
provide a modest level of scale-out. From this basic pattern, we can extend what we have learned here
to create any sized cluster.
This section does not cover how to import data from an existing Neo4j instance. For help on using an
existing Neo4j database to seed a new Causal Cluster, please see Seed a Causal Cluster.
45

The minimum number of Core servers is 3. The mimimum number of Read replica
servers is 0. Refer to Introduction to learn more.
Download and configure
• Download a copy of Neo4j Enterprise Edition from the Neo4j download site
(http://neo4j.com/download/), and unpack on your target machine or machines.
While most operational tasks are handled by the neo4j-admin tool, an initial cluster requires a
configuration file to initiate service. In this section, we will edit the neo4j.conf file which is where basic
and advanced configuration options for the database instance are held. This file is located in the conf/
directory of the database installation.
When running Neo4j on three separate machines, the basic configuration requires changing three
settings for clustering, and two settings for networking. If you want to try out setting up a Causal
Cluster on your local machine refer to Set up a local Causal Cluster.
These settings are located in neo4j.conf under the header "Network connector configuration".
dbms.connectors.default_listen_address
The address or network interface this machine uses to listen for incoming messages.
Uncommenting this line sets this value to 0.0.0.0 which allows Neo4j to bind to any and all
network interfaces. Uncomment the line dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address
The address that other machines are told to connect to. In the typical case, this should be set to the
public IP address of this server. For example, if the IP address is 33.44.55.66, this setting should be:
dbms.connectors.default_advertised_address=33.44.55.66
These settings are located in neo4j.conf under the header "Causal Clustering Configuration".
dbms.mode
The operating mode of a single database instance. For Causal Clustering, there are two possible
modes: CORE or READ_REPLICA. On three of our instances, we will use the CORE mode. Uncomment
the line: #dbms.mode=CORE. On the other three instances, we will use the READ_REPLICA mode.
Uncomment the line: #dbms.mode=CORE and change it to dbms.mode=READ_REPLICA.
causal_clustering.expected_core_cluster_size
The initial cluster size at startup. It is necessary for achieving an early stable membership state and
subsequently for safe writes to the cluster. This value is the number of Core instances you intend
to have as part of your cluster. The minimum number of instances to form a safe cluster is three.
For example, causal_clustering.expected_core_cluster_size=3 will specify that the cluster has
three Core members.
causal_clustering.initial_discovery_members
The network addresses of an initial set of Core cluster members available to bootstrap this Core or
Read replica instance. The value is given as a comma-separated list of address/port pairs. In the
simplest case each of the specified network addresses resolves to a working Neo4j Core server
instance. In the degenerate case just one active instance listed here can be used to bootstrap the
current instance. The current instance will then discover other available servers via the discovery
protocol (see: Discovery protocol). This value requires you to know the address of the Core
instances in the cluster. The default port is :5000. You should include the address of the local
machine in this setting as well.
Apply these settings to each configuration file on each instance. The values will be the same for each.
46
Start the Neo4j servers
Start the Neo4j servers as usual. Note that the startup order does not matter.
server-1$ ./bin/neo4j start
server-2$ ./bin/neo4j start
server-3$ ./bin/neo4j start
Startup Time

If you want to follow along with the startup of a server you can follow the messages
in logs/neo4j.log. On a Unix system issue the command tail -f logs/neo4j.log. On
Windows Server run Get-Content .\logs\neo4j.log -Tail 10 -Wait. While an
instance is joining the cluster, the server may appear unavailable. In the case where
an instance is joining a cluster with lots of data, it may take a number of minutes for
the new instance to download the data from the cluster and become available.
Now you can access the three servers and check their status. Open the locations below in a web
browser and issue the following query: CALL dbms.cluster.overview(). This will show you the status of
the cluster and information about each member of the cluster.
• http://server-1:7474/
• http://server-2:7474/
• http://server-3:7474/
You now have a Neo4j Causal Cluster of three instances running.
Adding Core servers
Adding instances to the Core cluster is simply a matter of starting a new database server with the
appropriate configuration as described in Download and configure. Following those instructions, we
need to change neo4j.conf to reflect the new Core server’s desired configuration like so:
• Set dbms.mode=CORE
• causal_clustering.initial_discovery_members so that it contains the hostname/ip address and
port of at least one active Core server, for example
causal_clustering.initial_discovery_members=server-1:5000,server-2:5000
Once we’ve done that we can simply start the server (e.g. with bin/neo4j start or via service startup
as your deployment demands) and the new server will integrate itself with the existing cluster. One
the server has copied over the graph data from its peers it will become available.

Unscripted installations are tricky things! If you have to install databases by hand,
then the logs/neo4j.log file will contain helpful information about debugging your
installation. But if you can script your installations, it really makes a lot of sense.
Adding Read replicas
Initial Read replica configuration is provided similarly to Core servers via neo4j.conf. Since Read
replicas do not participate in cluster quorum decisions, their configuration is shorter. They simply
need to know the addresses of some of the Core servers which they can bind to in order to run the
47
discovery protocol (see: Discovery protocol for details). Once it has completed the initial discovery the
Read replica becomes aware of the available Core servers and can choose an appropriate one from
which to catch up (see: Catchup protocol for how that happens).
In the neo4j.conf file in the section "Causal Clustering Configuration", the following settings need to be
changed:
• The operating mode of the database, dbms.mode=CORE should be uncommented and set to
READ_REPLICA.
• The address of the Core instances in the cluster,
causal_clustering.initial_discovery_members=localhost:5000,localhost:5001,localhost:5002
should be uncommented and addresses and ports of the Core members should be put here.
4.1.4. Seed a Causal Cluster
This section describes how to seed a Neo4j Causal Cluster from a backup.
In Create a new Causal Cluster we learned how to create a cluster with an empty store. Unless you’re
brand new to Neo4j though, it is likely that you’ll already have an existing Neo4j database whose
contents you’d like to transfer to a Causal Cluster. In this section we’ll learn how to import an existing
Neo4j database to a Causal Cluster, assuming we have already created a backup from either a
standalone Neo4j instance or a Neo4j Highly Available cluster.

The same process that we follow here can be used also to seed a new Causal
Cluster from an existing Read replica. This can be useful, for example, in disaster
recovery where some servers have retained operability during a catastrophic event.
Overview of the process
In this section we’ll assume we already have a Neo4j store that comes from a standalone or HA Neo4j
instance and that this store resides under the seed-dir/ directory. This can be the result of an online
backup from a running instance or from the database directory of a gracefully shutdown Neo4j
instance.
• Create a new Neo4j Core-only cluster, as described in Download and configure. Do not start the
instances that make up the cluster, we first need to restore the database contents.
• Use the restore command of the neo4j-admin tool to restore the seeding store from the seed-dir/
directory into the database directory on each of the Core instances in your cluster.
neo4j-01$ ./bin/neo4j-admin restore --from=seed-dir/ --database=graph.db
neo4j-02$ ./bin/neo4j-admin restore --from=seed-dir/ --database=graph.db
neo4j-03$ ./bin/neo4j-admin restore --from=seed-dir/ --database=graph.db

The command lines above assume that the database name is the default graph.db.
If you have configured a different database name, change the command line
argument accordingly.
At this point, one instance of the Core cluster has the database files that contain the our graph data.
Internally it has everything necessary to form a cluster. You can proceed to start all instances and the
cluster will form, and data will be replicated between the instances.
48
neo4j-01$ ./bin/neo4j start
neo4j-02$ ./bin/neo4j start
neo4j-03$ ./bin/neo4j start
On restoring, the whole cluster availability will be (almost) instantaneous as the cluster performs its
boot-up housekeeping before coming online.
It is also possible at this point to bring online Read replicas to provide query scale for your Causal
Cluster.

If the cluster does not form as expected, the logs will contain sufficient information
for the operator to determine the problem.
If we want to export the contents of a Causal Cluster to seed a standalone or Neo4j HA deployment,
we first need to perform a backup from the cluster and use the resulting store files as the source of
the new system. See Neo4j online backup and restore for more information.
Upgrading from previous version
If we want to upgrade from an earlier version of Neo4j we can set dbms.allow_format_migration=true
in neo4j.conf for each of the servers.
Leaving a cluster
Should we wish to downgrade a Core server to a standalone instance it can be done with the neo4jadmin unbind [--database=<name>] command on a shutdown database. Unbinding is only supported
on shutdown instances. It will fail to execute on any live databases.
Once a server has been unbound from a cluster, the store files are equivalent to a Neo4j standalone
instance. From this point those files could be used to run a standalone instance or be used to seed a
new cluster.
4.1.5. Causal Cluster settings reference
This section enumerates the important settings related to running a Neo4j Causal Cluster.
dbms.mode
This setting configures the operating mode of the database. For Causal Clustering, there are two
possible modes: CORE or READ_REPLICA.
causal_clustering.expected_core_cluster_size
A hint to the cluster at startup for the intended cluster size. This is necessary for achieving quorum
writes to the cluster. This value is the number of instances you intend to have as part of your
cluster. The minimum number of instances to form a cluster is three. For example,
causal_clustering.expected_core_cluster_size=3 will specify that the cluster has three core
members.
causal_clustering.initial_discovery_members
Machines running Neo4j communicate over the network to ensure consistency of the database
between themselves. A common scenario is to deploy Neo4j on many machines. In this case each
49
member of the cluster must be given bootstrap information about the other members that are
likely in the cluster. Specifying an instance’s own address is permitted. Do not use any whitespace
in this configuration option. For example,
causal_clustering.initial_discovery_members=neo22:5001,neo21:5001,neo20:5001 will attempt to
reach Neo4j instances listening on neo22 on port 5001 and neo21 on port 5001 and neo20 also on
port 5001.
causal_clustering.raft_advertised_address
The address/port setting that specifies where the Neo4j instance advertises to other members of
the cluster that it will listen for Raft messages within the Core cluster. For example,
causal_clustering.raft_advertised_address=192.168.33.20:7000 will listen for for cluster
communication in the network interface bound to 192.168.33.20 on port 7000.
causal_clustering.transaction_advertised_address
The address/port setting that specifies where the instance advertises where it will listen for
requests for transactions in the transaction-shipping catchup protocol. For example,
causal_clustering.transaction_advertised_address=192.168.33.20:6001 will listen for transactions
from cluster members on the network interface bound to 192.168.33.20 on port 6001.
causal_clustering.discovery_listen_address
The address/port setting for use by the discovery protocol. This is the setting which will be included
in the setting causal_clustering.initial_discovery_members which are set in the configuration of
the other members of the cluster. For example,
causal_clustering.discovery_listen_address=0.0.0.0:5001 will listen for cluster membership
communication on any network interface at port 5001.
causal_clustering.raft_listen_address
The address/port setting that specifies which network interface and port the Neo4j instance will
bind to for cluster communication. This setting must be set in coordination with the address this
instance advertises it will listen at in the setting causal_clustering.raft_advertised_address. For
example, causal_clustering.raft_listen_address=0.0.0.0:7000 will listen for cluster
communication on any network interface at port 7000.
causal_clustering.transaction_listen_address
The address/port setting that specifies which network interface and port the Neo4j instance will
bind to for cluster communication. This setting must be set in coordination with the address this
instance advertises it will listen at in the setting
causal_clustering.transaction_advertised_address. For example,
causal_clustering.transaction_listen_address=0.0.0.0:6001 will listen for cluster communication
on any network interface at port 7000.
4.2. Highly Available cluster
This chapter gives a comprehensive description of the Highly Available cluster, including the
architecture, configuration details and instructions.
This chapter gives a comprehensive description of Highly Available clusters. It starts with a discussion
about architecture and the different components making up the Highly Available cluster. It then
proceeds to explicit configuration details and gives instructions on how to configure and operate the
Highly Available cluster. For a hands-on tutorial for setting up a Neo4j Highly Available cluster, see Set
up a Highly Available cluster.
The chapter describes the following:
• Architecture of a Highly Available cluster
50
• Configure a Highly Available cluster
• Install an arbiter instance
• Endpoints for status information
• HAProxy for load balancing
4.2.1. High Availability
This section describes the architecture of a Neo4j Highly Available cluster.
A Neo4j cluster is comprised of a single master instance and zero or more slave instances. All
instances in the cluster have full copies of the data in their local database files. The basic cluster
configuration consists of three instances:
Figure 9. Neo4j cluster
Each instance contains the logic needed in order to coordinate with the other members of the cluster
for data replication and election management, represented by the green arrows in the picture above.
Each slave instance that is not an arbiter instance (see below) communicates with the master to keep
databases up to date, as represented by the blue arrows in the picture above.
Arbiter instance
A special case of a slave instance is the arbiter instance. The arbiter instance comprises the full Neo4j
software running in an arbiter mode, such that it participates in cluster communication, but it does
not replicate a copy of the datastore.
Transaction propagation
Write transactions performed directly on the master will execute as though the instance were running
in non-cluster mode. On success the transaction will be pushed out to a configurable number of
slaves. This is done optimistically, meaning that if the push fails, the transaction will still be successful.
When performing a write transaction on a slave each write operation will be synchronized with the
master. Locks will be acquired on both master and slave. When the transaction commits it will first be
committed on the master and then, if successful, on the slave. To ensure consistency, a slave must be
up to date with the master before performing a write operation. The automatic updating of slaves is
built into the communication protocol between slave and master.
51
Failover
Whenever a Neo4j database becomes unavailable, for example caused by hardware failure or network
outage, the other instances in the cluster will detect that and mark it as temporarily failed. A database
instance that becomes available after an outage will automatically catch up with the cluster.
If the master goes down another member will be elected and have its role switched from slave to
master after quorum (see below) has been reached within the cluster. When the new master has
performed its role switch, it will broadcast its availability to all the other members of the cluster.
Normally a new master is elected and started within seconds. During this time no writes can take
place
Quorum
A cluster must have quorum to elect a new master. Quorum is defined as: more than 50% of active
cluster members. A simple rule of thumb when designing a cluster is: A cluster that must be able to
tolerate n master instance failures requires 2n+1 instances to satisfy quorum and allow elections to
take place. Therefore, the simplest valid cluster size is three instances, which allows for a single
master failure.
Election Rules
1. If a master fails, or on a cold-start of the cluster, the slave with the highest committed transaction
ID will be elected as the new master. This rule ensures that the slave with the most up-to-date
datastore becomes the new master.
2. If a master fails and two or more slaves are tied, i.e. have the same highest committed transaction
ID, the slave with the lowest ha.server_id value will be elected the new master. This is a good tiebreaker because the ha.server_id is unique within the cluster, and allows for configuring which
instances can become master before others.
Branching
Data branching can be caused in two different ways:
• A slave falls too far behind the master and then leaves or re-joins the cluster. This type of
branching is harmless.
• The master re-election happens and the old master has one or more committed transactions that
the slaves did not receive before it died. This type of branching is harmful and requires action.
The database makes the best of the situation by creating a directory with the contents of the database
files from before branching took place so that it can be reviewed and the situation be resolved. Data
branching does not occur under normal operations.
Summary
All this can be summarized as:
• Write transactions can be performed on any database instance in a cluster.
• Neo4j cluster is fault tolerant and can continue to operate from any number of machines down to
a single machine.
• Slaves will be automatically synchronized with the master on write operations.
• If the master fails, a new master will be elected automatically.
• The cluster automatically handles instances becoming unavailable (for example due to network
issues), and also makes sure to accept them as members in the cluster when they are available
again.
52
• Transactions are atomic, consistent and durable but eventually propagated out to other slaves.
• Updates to slaves are eventually consistent by nature but can be configured to be pushed
optimistically from master during commit.
• If the master goes down, any running write transaction will be rolled back and new transactions
will block or fail until a new master has become available.
• Reads are highly available and the ability to handle read load scales with more database instances
in the cluster.
4.2.2. Setup and configuration
Neo4j can be configured in cluster mode to accommodate differing requirements for load, fault
tolerance and available hardware.
Follow these steps in order to configure a Neo4j cluster:
1. Download and install the Neo4j Enterprise Edition on each of the servers to be included in the
cluster.
2. If applicable, decide which server(s) that are to be configured as arbiter instance(s).
3. Edit the Neo4j configuration file on each of the servers to accommodate the design decisions.
4. Follow installation instructions for a single instance installation.
5. Modify the configuration files on each server as outlined in the section below. There are many
parameters that can be modified to achieve a certain behavior. However, the only ones mandatory
for an initial cluster are: dbms.mode, ha.server_id and ha.initial_hosts.
Important configuration settings
At startup of a Neo4j cluster, each Neo4j instance contacts the other instances as configured. When
an instance establishes a connection to any other, it determines the current state of the cluster and
ensures that it is eligible to join. To be eligible the Neo4j instance must host the same database store
as other members of the cluster (although it is allowed to be in an older state), or be a new
deployment without a database store.
Please note that IP addresses or hostnames should be explicitly configured for the machines
participating in the cluster. In the absence of a specified IP address, Neo4j will attempt to find a valid
interface for binding. This is not recommended practice.
dbms.mode
dbms.mode configures the operating mode of the database.
For cluster mode it is set to: dbms.mode=HA
ha.server_id
ha.server_id is the cluster identifier for each instance. It must be a positive integer and must be
unique among all Neo4j instances in the cluster.
For example, ha.server_id=1.
ha.host.coordination
ha.host.coordination is an address/port setting that specifies where the Neo4j instance will listen for
cluster communication. The default port is 5001.
53
For example, ha.host.coordination=192.168.33.22:5001 will listen for cluster communications on port
5001.
ha.initial_hosts
ha.initial_hosts is a comma separated list of address/port pairs, which specifies how to reach other
Neo4j instances in the cluster (as configured via their ha.host.coordination option). These
hostname/ports will be used when the Neo4j instances start, to allow them to find and join the
cluster. When cold starting the cluster, i.e. when no cluster is available yet, the database will be
unavailable until all members listed in ha.initial_hosts are online and communicating with each other.
It is good practice to configure all the instances in the cluster to have the exact same entries in
ha.initial_hosts, for the cluster to come up quickly and cleanly.
Do not use any whitespace in this configuration option.
For example, ha.initial_hosts=192.168.33.21:5001,192.168.33.22:5001,192.168.33.23:5001 will
initiate a cluster containing the hosts 192.168.33.21-33, all listening on the same port, 5001.
ha.host.data
ha.host.data is an address/port setting that specifies where the Neo4j instance will listen for
transactions from the cluster master. The default port is 6001.
ha.host.data must use a different port than ha.host.coordination.
For example, ha.host.data=192.168.33.22:6001 will listen for transactions from the cluster master on
port 6001.
Address and port formats
The ha.host.coordination and ha.host.data configuration options are specified as
<hostname or IP address>:<port>.
For ha.host.data the address must be an address assigned to one of the host’s
network interfaces.

For ha.host.coordination the address must be an address assigned to one of the
host’s network interfaces, or the value 0.0.0.0, which will cause Neo4j to listen on
every network interface.
Either the address or the port can be omitted, in which case the default for that part
will be used. If the hostname or IP address is omitted, then the port must be
preceded with a colon (eg. :5001).
The syntax for setting a port range is: <hostname or IP address>:<first port>[<second port>]. In this case, Neo4j will test each port in sequence, and select the
first that is unused. Note that this usage is not permitted when the hostname is
specified as 0.0.0.0 (the "all interfaces" address).
For a hands-on tutorial for setting up a Neo4j cluster, see Set up a Highly Available cluster.
Review Reference for a list of all available configuration settings.
4.2.3. Arbiter instances
A typical deployment of Neo4j will use a cluster of three machines to provide fault tolerance and read
scalability.
54
While having at least three instances is necessary for failover to happen in case the master becomes
unavailable, it is not required for all instances to run the full Neo4j stack. Instead, something called
arbiter instances can be deployed. They are regarded as cluster participants in that their role is to take
part in master elections with the single purpose of breaking ties in the election process. That makes
possible a scenario where you have a cluster of two Neo4j database instances and an additional
arbiter instance and still enjoy tolerance of a single failure of either of the three instances.
Arbiter instances are configured in neo4j.conf using the same settings as standard Neo4j cluster
members. The instance is configured to be an arbiter by setting the dbms.mode option to ARBITER.
Settings that are not cluster specific are of course ignored, so you can easily start up an arbiter
instance in place of a properly configured Neo4j instance.
To start the arbiter instance, run neo4j as normal:
neo4j_home$ ./bin/neo4j start
You can stop, install and remove it as a service and ask for its status in exactly the same way as for
other Neo4j instances.
4.2.4. Endpoints for status information
Introduction
A common use case for Neo4j HA clusters is to direct all write requests to the master while using
slaves for read operations, distributing the read load across the cluster and and gain failover
capabilities for your deployment. The most common way to achieve this is to place a load balancer in
front of the HA cluster, an example being shown with HA Proxy. As you can see in that guide, it makes
use of a HTTP endpoint to discover which instance is the master and direct write load to it. In this
section, we will deal with this HTTP endpoint and explain its semantics.
The endpoints
Each HA instance comes with 3 endpoints regarding its HA status. They are complimentary but each
may be used depending on your load balancing needs and your production setup. Those are:
• /db/manage/server/ha/master
• /db/manage/server/ha/slave
• /db/manage/server/ha/available
The /master and /slave endpoints can be used to direct write and non-write traffic respectively to
specific instances. This is the optimal way to take advantage of Neo4j’s scaling characteristics. The
/available endpoint exists for the general case of directing arbitrary request types to instances that are
available for transaction processing.
To use the endpoints, perform an HTTP GET operation on either and the following will be returned:
Table 6. HA HTTP endpoint responses
Endpoint
/db/manage/server/ha/master
Instance State
Returned Code
Body text
Master
200 OK
true
Slave
404 Not Found
false
Unknown
404 Not Found
UNKNOWN
55
Endpoint
/db/manage/server/ha/slave
Instance State
Returned Code
Body text
Master
404 Not Found
false
Slave
200 OK
true
Unknown
404 Not Found
UNKNOWN
Master
200 OK
master
200 OK
slave
404 Not Found
UNKNOWN
/db/manage/server/ha/available Slave
Unknown
Examples
From the command line, a common way to ask those endpoints is to use curl. With no arguments, curl
will do an HTTP GET on the URI provided and will output the body text, if any. If you also want to get
the response code, just add the -v flag for verbose output. Here are some examples:
• Requesting master endpoint on a running master with verbose output
#> curl -v localhost:7474/db/manage/server/ha/master
* About to connect() to localhost port 7474 (#0)
*
Trying ::1...
* connected
* Connected to localhost (::1) port 7474 (#0)
> GET /db/manage/server/ha/master HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: localhost:7474
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain
< Access-Control-Allow-Origin: *
< Transfer-Encoding: chunked
< Server: Jetty(6.1.25)
<
* Connection #0 to host localhost left intact
true* Closing connection #0
• Requesting slave endpoint on a running master without verbose output:
#> curl localhost:7474/db/manage/server/ha/slave
false
• Finally, requesting the master endpoint on a slave with verbose output
#> curl -v localhost:7475/db/manage/server/ha/master
* About to connect() to localhost port 7475 (#0)
*
Trying ::1...
* connected
* Connected to localhost (::1) port 7475 (#0)
> GET /db/manage/server/ha/master HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-apple-darwin12.0) libcurl/7.24.0 OpenSSL/0.9.8r zlib/1.2.5
> Host: localhost:7475
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Content-Type: text/plain
< Access-Control-Allow-Origin: *
< Transfer-Encoding: chunked
< Server: Jetty(6.1.25)
<
* Connection #0 to host localhost left intact
false* Closing connection #0
56
Unknown status

The UNKNOWN status exists to describe when a Neo4j instance is neither master nor
slave. For example, the instance could be transitioning between states (master to
slave in a recovery scenario or slave being promoted to master in the event of
failure), or the instance could be an arbiter instance. If the UNKNOWN status is
returned, the client should not treat the instance as a master or a slave and should
instead pick another instance in the cluster to use, wait for the instance to transit
from the UNKNOWN state, or undertake restorative action via systems admin.
If the Neo4j server has Basic Security enabled, the HA status endpoints will also require authentication
credentials. For some load balancers and proxy servers, providing this with the request is not an
option. For those situations, consider disabling authentication of the HA status endpoints by setting
dbms.security.ha_status_auth_enabled=false in the neo4j.conf configuration file.
4.2.5. HAProxy for load balancing
Configure HAProxy for load balancing in a Neo4j Highly Available cluster.
In the Neo4j HA architecture, the cluster is typically fronted by a load balancer. In this section we will
explore how to set up HAProxy to perform load balancing across the HA cluster.
For this tutorial we will assume a Linux environment with HAProxy already installed. See
http://www.haproxy.org/ for downloads and installation instructions.
Configuring HAProxy for the Bolt Protocol
In a typical HA deployment, HAProxy will be configured with two open ports, one for routing write
operations to the master and one for load balancing read operations over slaves. Each application will
have two driver instances, one connected to the master port for performing writes and one connected
to the slave port for performing reads.
First we set up the mode and timeouts. The settings below will kill the connection if a server or a client
is idle for longer than two hours. Long-running queries may take longer time, but this can be taken
care of by enabling HAProxy’s TCP heartbeat feature.
defaults
mode
tcp
timeout connect 30s
timeout client 2h
timeout server 2h
Set up where drivers wanting to perform writes will connect:
frontend neo4j-write
bind *:7680
default_backend current-master
Now we set up the backend that points to the current master instance.
57
backend current-master
option httpchk HEAD /db/manage/server/ha/master HTTP/1.0
server db01 10.0.1.10:7687 check port 7474
server db02 10.0.1.11:7687 check port 7474
server db03 10.0.1.12:7687 check port 7474
In the example above httpchk is configured in the way you would do it if authentication has been
disabled for Neo4j. By default however, authentication is enabled and you will need to pass in an
authentication header. This would be along the lines of option httpchk HEAD
/db/manage/server/ha/master HTTP/1.0\r\nAuthorization:\ Basic\ bmVvNGo6bmVvNGo= where the last
part has to be replaced with a base64 encoded value for your username and password.
Configure where drivers wanting to perform reads will connect:
frontend neo4j-read
bind *:7681
default_backend slaves
Finally, configure a backend that points to slaves in a round-robin fashion:
backend slaves
balance roundrobin
option httpchk HEAD /db/manage/server/ha/slave HTTP/1.0
server db01 10.0.1.10:7687 check port 7474
server db02 10.0.1.11:7687 check port 7474
server db03 10.0.1.12:7687 check port 7474
Note that the servers in the slave backend are configured the same way as in the current-master
backend.
Then by putting all the above configurations into one file, we get a basic workable HAProxy
configuration to perform load balancing for applications using the Bolt Protocol.
By default, encryption is enabled between servers and drivers. With encryption turned on, the
HAProxy configuration constructed above needs no change to work directly in TLS/SSL passthrough
layout for HAProxy. However depending on the driver authentication strategy adopted, some special
requirements might apply to the server certificates.
For drivers using trust-on-first-use authentication strategy, each driver would register the HAProxy
port it connects to with the first certificate received from the cluster. Then for all subsequent
connections, the driver would only establish connections with the server whose certificate is the same
as the one registered. Therefore, in order to make it possible for a driver to establish connections with
all instances in the cluster, this mode requires all the instances in the cluster sharing the same
certificate.
If drivers are configured to run in trusted-certificate mode, then the certificate known to the drivers
should be a root certificate to all the certificates installed on the servers in the cluster. Alternatively,
for the drivers such as Java driver who supports registering multiple certificates as trusted certificates,
the drivers also work well with a cluster if server certificates used in the cluster are all registered as
trusted certificates.
To use HAProxy with other encryption layout, please refer to their full documentation at their website.
Configuring HAProxy for the HTTP API
HAProxy can be configured in many ways. The full documentation is available at their website.
58
For this example, we will configure HAProxy to load balance requests to three HA servers. Simply write
the following configuration to /etc/haproxy/haproxy.cfg:
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:80
default_backend neo4j
backend neo4j
option httpchk GET /db/manage/server/ha/available
server s1 10.0.1.10:7474 maxconn 32
server s2 10.0.1.11:7474 maxconn 32
server s3 10.0.1.12:7474 maxconn 32
listen admin
bind *:8080
stats enable
HAProxy can now be started by running:
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg
You can connect to http://<ha-proxy-ip>:8080/haproxy?stats to view the status dashboard. This
dashboard can be moved to run on port 80, and authentication can also be added. See the HAProxy
documentation for details on this.
Optimizing for reads and writes
Neo4j provides a catalogue of health check URLs (see Endpoints for status information) that HAProxy
(or any load balancer for that matter) can use to distinguish machines using HTTP response codes. In
the example above we used the /available endpoint, which directs requests to machines that are
generally available for transaction processing (they are alive!).
However, it is possible to have requests directed to slaves only, or to the master only. If you are able to
distinguish in your application between requests that write, and requests that only read, then you can
take advantage of two (logical) load balancers: one that sends all your writes to the master, and one
that sends all your read-only requests to a slave. In HAProxy you build logical load balancers by adding
multiple backends.
The trade-off here is that while Neo4j allows slaves to proxy writes for you, this indirection
unnecessarily ties up resources on the slave and adds latency to your write requests. Conversely, you
don’t particularly want read traffic to tie up resources on the master; Neo4j allows you to scale out for
reads, but writes are still constrained to a single instance. If possible, that instance should exclusively
do writes to ensure maximum write performance.
The following example excludes the master from the set of machines using the /slave endpoint.
59
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:80
default_backend neo4j-slaves
backend neo4j-slaves
option httpchk GET /db/manage/server/ha/slave
server s1 10.0.1.10:7474 maxconn 32 check
server s2 10.0.1.11:7474 maxconn 32 check
server s3 10.0.1.12:7474 maxconn 32 check
listen admin
bind *:8080
stats enable

In practice, writing to a slave is uncommon. While writing to slaves has the benefit
of ensuring that data is persisted in two places (the slave and the master), it comes
at a cost. The cost is that the slave must immediately become consistent with the
master by applying any missing transactions and then synchronously apply the new
transaction with the master. This is a more expensive operation than writing to the
master and having the master push changes to one or more slaves.
Cache-based sharding with HAProxy
Neo4j HA enables what is called cache-based sharding. If the dataset is too big to fit into the cache of
any single machine, then by applying a consistent routing algorithm to requests, the caches on each
machine will actually cache different parts of the graph. A typical routing key could be user ID.
In this example, the user ID is a query parameter in the URL being requested. This will route the same
user to the same machine for each request.
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
frontend http-in
bind *:80
default_backend neo4j-slaves
backend neo4j-slaves
balance url_param user_id
server s1 10.0.1.10:7474 maxconn 32
server s2 10.0.1.11:7474 maxconn 32
server s3 10.0.1.12:7474 maxconn 32
listen admin
bind *:8080
stats enable
Naturally the health check and query parameter-based routing can be combined to only route
requests to slaves by user ID. Other load balancing algorithms are also available, such as routing by
source IP (source), the URI (uri) or HTTP headers(hdr()).
60
Chapter 5. Upgrade
This section describes how to upgrade Neo4j from an earlier version.
5.1. Upgrade planning
Plan your upgrade by following the steps in this chapter.
Throughout this instruction, the directory used to store the Neo4j data are referred to as database
directory. The path of the database directory is Data/databases/database name. There are two
settings that may affect the location of the database directory.
The Data directory:
• For default location of the Data directory, see File locations.
• If your database is stored in a custom location, this is configured using the setting
dbms.directories.data.
The database name:
• The default database name is graph.db
• A custom database name is configured by the parameter dbms.active_database.
5.1.1. Review supported upgrade paths
Before upgrading to a new major or minor release, the database must first be upgraded to the latest
version within the relevant release. The latest version is available at this page:
http://neo4j.com/download/other-releases. The following Neo4j upgrade paths are supported:
• 2.0.latest → 3.1.2
• 2.1.latest → 3.1.2
• 2.2.latest → 3.1.2
• 2.3.latest → 3.1.2
• 3.0.any → 3.1.2
5.1.2. Review the Upgrade guide at neo4j.com
Read through the Upgrade guide at neo4j.com (https://neo4j.com/guides/upgrade/). The Upgrade guide is
being maintained by Neo4j Customer Support and contains valuable information about upgrade
actions particular to this release.
5.1.3. Apply configuration changes
It happens that new configuration settings are introduced and that existing configurations are
changed between versions. Any such changes are pointed out in the Upgrade guide mentioned above.
Make sure that you have taken such changes into account.
5.1.4. Upgrade application code
As part of the upgrade planning, it is vital to test and potentially update the applications using Neo4j.
How much development time required to update the code code will depend on the particular
application.
61
5.1.5. Upgrade custom plugins
Check the Plugins directory (see File locations) to verify whether custom plugins are used in your
deployment. Ensure that any plugins are compatible with Neo4j 3.1.2.
5.1.6. Plan disk space requirements
An upgrade requires substantial free disk space, as it makes an entire copy of the database. For the
upgrade, make sure to make available an additional ( 50% * size_of(database directory) ). In addition
to this, do not forget to reserve the disk space needed for the pre-upgrade backup.
The upgraded database may require slightly larger data files overall.
5.1.7. Perform a test upgrade
Based on the findings in this chapter, allocate a test environment for the upgrade and do a test
upgrade. The test upgrade will give you valuable information about the time required for the
production upgrade, as well as potential additional action points such as upgrade of plugins and
application code.
5.2. Single-instance upgrade
This section describes upgrading a single Neo4j instance. To upgrade a Neo4j HA cluster (Neo4j
Enterprise Edition), a very specific procedure must be followed. Please see Neo4j HA cluster upgrade.
5.2.1. Upgrade from 2.x
1. Cleanly shut down the database if it is running.
2. Make a backup copy of the database directory. If using the online backup tool available with
Neo4j Enterprise Edition, ensure that backups have completed successfully.
3. Install Neo4j 3.1.2.
4. Review the settings in the configuration files of the previous installation and transfer any custom
settings to the 3.1.2 installation. Since many settings have been changed between Neo4j 2.x and
3.1.2, it is advisable to use the 2.x-config-migrator to migrate the configuration files for you. The
2.x-config-migrator can be found in the tools directory, and can be invoked with a command like:
java -jar 2.x-config-migrator.jar path/to/neo4j2.3 path/to/neo4j3.1.2. Take note of any
warnings printed, and manually review the edited configuration files produced.
5. Import your data from the old installation using neo4j-admin import --mode=database
--database=<database-name> --from=<source-directory>.
6. If the database is not called graph.db, set dbms.active_database in neo4j.conf to the name of the
database.
7. Set dbms.allow_format_migration=true in neo4j.conf of the 3.1.2 installation. Neo4j will fail to start
without this configuration.
8. Start up Neo4j 3.1.2.
9. The database upgrade will take place during startup.
10. Information about the upgrade and a progress indicator are logged into debug.log.
11. When upgrade has finished, the dbms.allow_format_migration should be set to false or be
removed.
12. It is good practice to make a full backup immediately after the upgrade.
62
Cypher compatibility

The Cypher language may evolve between Neo4j versions. For backward
compatibility, Neo4j provides directives which allow explicitly selecting a previous
Cypher language version. This is possible to do globally or for individual statements,
as described in the Neo4j Developer Manual (http://neo4j.com/docs/developermanual/3.1).
5.2.2. Upgrade from 3.x
1. Cleanly shut down the database if it is running.
2. Make a backup copy of the database directory. If using the online backup tool available with
Neo4j Enterprise Edition, ensure that backups have completed successfully.
3. Install Neo4j 3.1.2.
4. Review the settings in the configuration files of the previous installation and transfer any custom
settings to the 3.1.2 installation.
5. If using the default data directory, copy it from the old installation to the new. If databases are
stored in a custom location, configure dbms.directories.data for the new installation to point to
this custom location.
6. If the database is not called graph.db, set dbms.active_database in neo4j.conf to the name of the
database.
7. Set dbms.allow_format_migration=true in neo4j.conf of the 3.1.2 installation. Neo4j will fail to start
without this configuration.
8. Start up Neo4j 3.1.2.
9. The database upgrade will take place during startup.
10. Information about the upgrade and a progress indicator are logged into debug.log.
11. When upgrade has finished, the dbms.allow_format_migration should be set to false or be
removed.
12. It is good practice to make a full backup immediately after the upgrade.
5.3. Neo4j HA cluster upgrade
This section describes how to upgrade a Neo4j Highly Available cluster.
Upgrading a Neo4j HA cluster to Neo4j 3.1.2 requires following a specific process in order to ensure
that the cluster remains consistent, and that all cluster instances are able to join and participate in the
cluster following their upgrade. Neo4j 3.1.2 does not support rolling upgrades.
5.3.1. Back up the Neo4j database
• Before starting any upgrade procedure, it is very important to make a full backup of your database.
• For detailed instructions on backing up your Neo4j database, refer to the backup chapter.
5.3.2. Shut down the cluster
• Shut down the slave instances one by one.
• Shut down the master last.
63
5.3.3. Upgrade the master
1. Install Neo4j 3.1.2 on the master, keeping the database directory untouched.
2. Disable HA in the configuration, by setting dbms.mode=SINGLE in neo4j.conf.
3. Upgrade as described for a single instance of Neo4j
4. When upgrade has finished, shut down Neo4j again.
5. Re-enable HA in the configuration by setting dbms.mode=HA in neo4j.conf.
6. Make a full backup of the Neo4j database. Please note that backups from before the upgrade are
no longer valid for update via the incremental online backup. Therefore it is important to perform
a full backup, using an empty target directory, at this point.
5.3.4. Upgrade the slaves
On each slave:
1. Remove the database directory.
2. Install Neo4j 3.1.2.
3. Review the settings in the configuration files in the previous installation, and transfer any custom
settings to the 3.1.2 installation. Be aware of settings that have changed name between versions.
4. If the database is not called graph.db, set dbms.active_database in neo4j.conf to the name of the
database.
5. If applicable, copy the security configuration from the master, since this is not propagated
automatically.

At this point it is an alternative to manually copy database directory from the
master to the slaves. Doing so will avoid the need to sync from the master when
starting. This can save considerable time when upgrading large databases.
5.3.5. Restart the cluster
1. Start the master instance.
2. Start the slaves, one by one. Once a slave has joined the cluster, it will sync the database from the
master instance.
64
Chapter 6. Backup
This chapter covers performing and restoring backups of a Neo4j database deployed as a
Causal Cluster, Highly Available cluster or single instance.

The backup features are available in Neo4j Enterprise Edition.
6.1. Introducing backups
Backing up your Neo4j database to remote or offline storage is a fundamental part of operational
hygiene. Neo4j supports both full and incremental backups. The backup procedure is the same for a
stand-alone database, for Highly Available cluster, and for a Causal Cluster.
For a Causal Cluster, however, we should pay some attention to the Core and Read replica roles and
which of them is best suited to act as a backup server. See Backup a Causal Cluster for details.
Additionally, restoring a Neo4j Causal Cluster is a little different. See Seed a Causal Cluster for details.
Backups are performed over the network, from a running Neo4j server and into a local copy of the
database store. The backup is run using the neo4j-backup command.
6.1.1. Enabling backups
Two parameters must be configured in order to perform backups.
• dbms.backup.enabled=true will enable backups; this is the default value.
• dbms.backup.address=<hostname or IP address>:6362 configures the interface and port that the
backup service listens on. The value of the parameter defaults to the loopback interface and port
6362. It can also be configured to listen on all interfaces by setting
dbms.backup.address=0.0.0.0:6362.
6.1.2. Storage considerations
For any backup it is important that the data is stored separately from the production system where
there are no common dependencies. It is advisable to keep the backup on stable storage outside of
the cluster servers, on different (network attached) storage, and preferably off site (for example to the
cloud, a different availability zone within the same cloud, or a separate cloud). Since backups are kept
for a long time, the longevity of archival storage should be considered as part of backup planning.
See Configuration settings for detailed documentation on available configuration options.
6.2. Perform a backup
This section describes how to perform a backup of a Neo4j database.
6.2.1. Backup commands
The neo4j-admin tool is located in the bin directory. Run it with the backup argument in order to
perform an online backup of a running database.
Syntax
65
neo4j-admin backup --backup-dir=<backup-path> --name=<graph.db-backup>
[--from=<address>] [--fallback-to-full[=<true|false>]]
[--checkconsistency[=<true|false>]]
[--cc-report-dir=<directory>]
[--additional-config=<config-file-path>]
[--timeout=<timeout>]
Options
Option
Default
Description
--backup-dir
Directory to place backup in.
--name
Name of backup. If a backup with this
name already exists an incremental
backup will be attempted.
--from
localhost:6362
Host and port of Neo4j.
--fallback-to-full
true
If an incremental backup fails backup
will move the old backup to
<name>.err.<N> and fallback to a full
backup instead.
--check-consistency
true
If a consistency check should be made.
--cc-report-dir
.
Directory where consistency report
will be written.
--additional-config
--timeout
Configuration file to supply additional
configuration in.
20m
Timeout in the form
<time>[ms|s|m|h], where the default
unit is seconds.
Backup a database
Perform a full backup: Create an empty directory and run the backup command
$neo4j-home> mkdir /mnt/backup
$neo4j-home> bin/neo4j-admin backup --from=192.168.1.34 --backup-dir=/mnt/backup --name=graph.dbbackup
Doing full backup...
2017-02-01 14:09:09.510+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels
2017-02-01 14:09:09.537+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db.labels 8.00
kB
2017-02-01 14:09:09.538+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db
2017-02-01 14:09:09.540+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.nodestore.db 16.00 kB
...
...
...
Now if you do a directory listing of /mnt/backup you will see that you have a backup of Neo4j
called graph-db.backup.
6.2.2. Incremental backups
An incremental backup is performed whenever an existing backup directory is specified and the
transaction logs are present since the last backup (see note below). The backup command will then
copy any new transactions from Neo4j and apply them to the backup. The result will be an updated
backup that is consistent with the current server state.
Perform an incremental backup
66
Perform an incremental backup: Specify the location of your previous backup
$neo4j-home> bin/neo4j-admin backup --from=192.168.1.34 --backup-dir=/mnt/backup --name=graph.dbbackup --fallback-to-full=true --check-consistency=true
Destination is not empty, doing incremental backup...
Backup complete.
The incremental backup will fail if the existing directory does not contain a valid backup, and
fallback-to-full=false. It will also fail if the required transaction logs have been removed and
fallback-to-full=false. Setting fallback-to-full=true is a safeguard which will result in a full backup
in case an incremental backup cannot be performed.

Note that when copying the outstanding transactions, the server needs access to
the transaction logs. These logs are being maintained by Neo4j and automatically
removed after a period of time, based on the parameter
dbms.tx_log.rotation.retention_policy. Therefore, when designing your backup
strategy it is important to configure dbms.tx_log.rotation.retention_policy such that
transaction logs are kept between incremental backups.
6.3. Restore a backup
This section covers how to restore from a backup of a Neo4j database.
6.3.1. Restore a single database
Backups are restored using the restore argument to the neo4j-admin tool. To restore a backup the
database must be shut down.
Restore a database
Restore the database graph.db from the backup located in /mnt/backup/neo4j-backup
neo4j-home> bin/neo4j stop
neo4j-home> bin/neo4j-admin restore --from=/mnt/backup/neo4j-backup --database=graph.db --force
neo4j-home> bin/neo4j start
6.3.2. Restore an HA cluster
To restore from backup in an HA cluster environment, follow these steps:
1. Shut down all database instances in the cluster.
2. Restore the backup on each instance.
3. Start the database instances.
6.3.3. Restore a Causal Cluster
The steps for restoring a Neo4j Causal Cluster follows the same steps as for seeding the cluster
initially. Follow the procedure described in Seed a Causal Cluster.
67
6.4. Backup a Causal Cluster
This section discusses considerations when backing up a Causal Cluster.
In a Neo4j Causal Clustering cluster, both Core servers and Read replicas support the backup protocol.
Servers of either role can be used for cluster backups. Below are some considerations that you should
regard before determining which backup strategy to use.
6.4.1. Read replica backups
Generally we prefer to select Read replicas to act as our backup providers since they are far more
numerous than Core servers in typical cluster deployments.
However since Read replicas are asynchronously replicated from Core servers, it is possible for them
to be some way behind in applying transactions with respect to the Core cluster. It may even be
possible for a Read replica to become orphaned from a Core server such that its contents are quite
stale. The pathologically bad case here is that we take a backup right now whose contents end up
being less up to date than a previous backup.
Fortunately we can check the last transaction ID processed on any server and in doing so we can
verify that it is sufficiently close to the latest transaction ID processed by the Core server. If it is in the
right ball-park, then we can safely proceed to backup from our Read replica in confidence that it is
quite up to date with respect to the Core servers.

Transaction IDs in Neo4j are strictly increasing integer values. A higher transaction
ID is therefore more recent than a lower one.
Neo4j servers expose the last transaction ID processed through JMX and via the Neo4j browser. The
latest transaction ID can be found by exposing Neo4j metrics or via the Neo4j Browser. To view the
latest processed transaction ID (and other metrics) in the Neo4j Browser, type :sysinfo at the prompt.
6.4.2. Core server backups
In a Core-only cluster, we don’t have the luxury of numerous Read replicas to scale out workload. As
such we pick a server based on factors like its physical proximity, bandwidth, performance, liveness
and so forth.
Generally speaking, the cluster will function as normal even while large backups are taking place.
However, backing up will place additional IO burdens on the backup server which may impact its
performance.
A very conservative view would be to treat the backup server as an unavailable instance, assuming its
performance will be lower than the other instances in the cluster. In such cases, it is recommended
that there is sufficient redundancy in the cluster such that one slower server does not reduce the
capacity to mask faults.
We can factor this conservative strategy into our cluster planning. The equation M = 2F + 1
demonstrates the relationship between M being the number of members in the cluster required to
tolerate F faults. To tolerate the possibility of one slower machine in the cluster during backup we
increase F. Thus if we originally envisaged a cluster of three Core servers to tolerate one fault, we
could increase that to five to maintain a plainly safe level of redundancy.
68
Chapter 7. Security
This chapter describes features pertaining to security in Neo4j.
To protect your data, first ensure your physical data security by following industry best practices with
regards to server and network security. Ensure that your Neo4j deployment adheres to your
company’s information security guidelines by setting up the appropriate authentication and
authorization rules. We describe authentication and authorization in Neo4j in Authentication and
authorization.
Logs can be harvested for continuous analysis, or for specific investigations. Facilities are available for
producing security event logs as well as query logs as described in Monitoring.
Finally, a simple checklist for Neo4j security is provided in Security checklist.
7.1. Authentication and authorization
This section describes authentication and authorization in Neo4j.

The features described in this section are available in Neo4j Enterprise Edition.
For Community Edition, refer to User management for Community Edition.
The section describes the following:
• Introduction
• Terminology
• Enabling authentication and authorization
• Native user and role management
• Native roles
• Custom roles
• Propagate users and roles
• Procedures for native user and role management
• Integration with LDAP
• Subgraph access control
7.1.1. Introduction
This chapter provides an overview of authentication and authorization in Neo4j.
Security in Neo4j is controlled by authentication and authorization. Authentication is the process of
ensuring that a user is who the user claims to be, while authorization pertains to checking whether
the authenticated user is allowed to perform a certain action.
Authorization is managed using role-based access control (RBAC). The core of the Neo4j security
model is centred around the four predefined graph-global data-access roles: reader, publisher,
architect and admin. Each role includes a set of authorized actions permitted on the Neo4j data graph
and its schema. A user can be assigned to none, one or more of these roles, as well as other custom
69
roles.
Neo4j has three auth providers that can perform user authentication and authorization:
• Native auth provider.
• LDAP auth provider.
• Custom-built plugin auth providers.
Native auth provider
Neo4j provides a native auth provider that stores user and role information locally on disk. With this
option, full user management is available as procedures described in Native user and role
management.
LDAP auth provider
Another way of controlling authentication and authorization is through external security software
such as Active Directory or OpenLDAP, which is accessed via the built-in LDAP connector. A description
of the LDAP plugin using Active Directory is available in Integration with LDAP.
Custom-built plugin auth providers
For clients with specific requirements not satisfied with either native or LDAP, Neo4j provides a plugin
option for building custom integrations. It is recommended that this option is used as part of a
custom delivery as negotiated with Neo4j Professional Services. The plugin is described in Developer
Manual → Plugins (http://neo4j.com/docs/developer-manual/3.1/rbac-plugins/).
7.1.2. Terminology
This chapter lists the relevant terminology related to authentication and authorization in
Neo4j.
The following terms are relevant to role-based access control within Neo4j:
active user
A user who is active within the system and can perform actions prescribed by any assigned roles
on the data. This is in contrast to a suspended user.
administrator
This is a user who has been assigned the admin role.
current user
This is the currently logged-in user invoking the commands described in this chapter.
password policy
The password policy is a set of rules of what makes up a valid password. For Neo4j, the following
rules apply:
• The password cannot be the empty string.
• When changing passwords, the new password cannot be the same as the previous password.
role
This is a collection of actions — such as read and write — permitted on the data. There are two
types of roles in Neo4j:
70
• Native roles are described in Native roles.
• Custom roles are described in Custom roles.
suspended user
A user who has been suspended is not able to access the database in any capacity, regardless of
any assigned roles.
user
• A user is composed of a username and credentials, where the latter is a unit of information,
such as a password, verifying the identity of a user.
• A user may represent a human, an application etc.
7.1.3. Enabling authentication and authorization
This chapter describes how to enable and disable authentication and authorization in Neo4j.
Authentication and authorization is enabled by default. It is possible to turn off all authentication and
authorization. This is done using the setting dbms.security.auth_enabled.
dbms.security.auth_enabled=false
This configuration makes the database vulnerable to malicious activities. Disabling autentication and
authorization is not recommended.
7.1.4. Native user and role management
This chapter describes native user and role management in Neo4j.
For managing native users, Neo4j offers additional actions for an administrator:
• View all users and roles (including viewing all roles for a user and viewing all users for a role)
• Create and delete any other user
• Assign and remove a role from a user
• Create and delete a custom-defined role
• Suspend and activate any other user
• Change the password for any user
When an administrator suspends or deletes another user, the following rules apply:
• Administrators can suspend or delete any other user (including other administrators), but not
themselves
• Deleting a user terminates all of the user’s running queries and sessions
• All queries currently running for the deleted user are rolled back
• The user will no longer be able to log back in (until re-activated by an administrator if suspended)
• There is no need to remove assigned roles from a user prior to deleting the user.
The chapter describes the following:
• Native roles
71
• Custom roles
• Propagate users and roles
• Procedures for native user and role management
Native roles
This chapter describes native roles in Neo4j.
Neo4j provides four built-in roles in our role-based access control framework:
• reader: read-only access to the data graph (all nodes, relationships, properties).
• publisher: read-write access to the data graph.
• architect: read-write access to the data graph, and set/delete access to indexes along with any
other future schema constructs.
• admin: read-write access to the data graph, set/delete access to indexes along with any other
future schema constructs, and the ability to view/terminate queries.
We detail below the set of actions on the data and database prescribed by each role:
Action
re pu ar ad
ad bl ch mi
er is it n
he ec
r t
(n
o
r
ol
e)
Change own password
X
X
X
X
X
View own details
X
X
X
X
X
Read data
X
X
X
X
View own queries
X
X
X
X
Terminate own queries
X
X
X
X
X
X
X
X
X
Write/update/delete data
Create/drop index/constraint
Create/delete user
X
Change another user’s password
X
Assign/remove role to/from user
X
Suspend/activate user
X
View all users/roles
X
View all roles for a user
X
View all users for a role
X
View all queries
X
Terminate all queries
X
A user who has no assigned roles will not have any rights or capabilities regarding the data, not even
read privileges. A user may have more than one assigned role, and the union of these determine what
action(s) on the data may be undertaken by the user.
Custom roles
This chapter describes custom roles in Neo4j.
72
Custom roles may be created and deleted by an administrator. Custom roles are created for the sole
purpose of controlling the ability to execute certain custom developed procedures. In contrast to the
native roles, a custom role will not have any permissions other than to execute procedures which have
been explicitly permitted in neo4j.conf.
More details regarding how to use custom roles to allow for subgraph access control may be found in
Subgraph access control.
Propagate users and roles
This chapter describes how to propagate native users, roles and role assignments across a
Neo4j cluster.
Native users, roles and role assignments are stored in files named auth and roles. The files are located
in the Data directory (see File locations) in a subdirectory called dbms. Neo4j automatically reloads the
stored users and assigned roles from disk every five seconds. Changes to users and roles are applied
only to the Neo4j instance on which the commands are executed. This means that changes are not
automatically propagated across a cluster of Neo4j instances, but have to be specifically provided for.
To propagate changes to native users, custom roles, and role assignments across a cluster, there are
three options:
• Manually copy users and roles files on disk to all other cluster instances
• Use a shared network folder to store users and roles files
• Create an automated process that synchronizes the stored data across the cluster using, for
example, a combination of rsync and crontab
We note that the recommended solution for clustered security is to use the LDAP or plugin auth
provider.
Procedures for native user and role management
This chapter describes procedures for native user and role management in Neo4j.
In Neo4j, native user and role management are managed by using built-in procedures through
Cypher. This chapter gives a list of all the security procedures for user management along with some
simple examples. Use the Neo4j Browser or the Neo4j Cypher Shell to run the examples provided.
The chapter describes the following:
• List all users
• List all roles
• List all roles for a user
• List all users for a role
• Create a user
• Delete a user
• Assign a role to a user
• Remove a role from a user
• Create a custom role
• Delete a custom role
73
• Suspend a user
• Activate a user
• Change a user’s password
• Change the current user’s password
• List roles per procedure
List all users
An administrator is able to view the details of every user in the system.
Syntax:
CALL dbms.security.listUsers()
Returns:
Name
Type
Description
username
String
This is the user’s username.
roles
List<String>
This is a list of roles assigned to the
user.
flags
List<String>
This is a series of flags indicating
whether the user is suspended or
needs to change their password.
Exceptions:
The current user is not an administrator.
Example 5. List all users
The following example shows, for each user in the system, the username, the roles assigned to
the user, and whether the user is suspended or needs to change their password.
CALL dbms.security.listUsers()
+---------------------------------------------------------------------+
| username | roles
| flags
|
+---------------------------------------------------------------------+
| "neo4j" | ["admin"]
| []
|
| "anne"
| []
| ["password_change_required"] |
| "bill"
| ["reader"]
| ["is_suspended"]
|
| "john"
| ["architect","publisher"] | []
|
+---------------------------------------------------------------------+
4 rows
List all roles
An administrator is able to view all assigned users for each role in the system.
Syntax:
CALL dbms.security.listRoles()
Returns:
74
Name
Type
Description
role
String
This is the name of the role.
users
List<String>
This is a list of the usernames of all
users who have been assigned the
role.
Exceptions:
The current user is not an administrator.
Example 6. List all roles
The following example shows, for each role in the system, the name of the role and the
usernames of all assigned users.
CALL dbms.security.listRoles()
+------------------------------+
| role
| users
|
+------------------------------+
| "reader"
| ["bill"]
|
| "architect" | []
|
| "admin"
| ["neo4j"]
|
| "publisher" | ["john","bob"] |
+------------------------------+
4 rows
List all roles for a user
Any active user is able to view all of their assigned roles. An administrator is able to view all assigned
roles for any user in the system.
Syntax:
CALL dbms.security.listRolesForUser(username)
Arguments:
Name
Type
Description
username
String
This is the username of the user.
Name
Type
Description
value
String
This returns all roles assigned to the
requested user.
Returns:
Exceptions:
The current user is not an administrator and the username does not match that of the current user.
The username does not exist in the system.
Considerations:
75
• This procedure may be invoked by the current user to view their roles, irrespective of whether or
not the current user is an administrator.
• This procedure may be invoked by an administrator to view the roles for another user.
Example 7. List all roles for a user
The following example lists all the roles for the user with username 'johnsmith', who has the
roles reader and publisher.
CALL dbms.security.listRolesForUser('johnsmith')
+-------------+
| value
|
+-------------+
| "reader"
|
| "publisher" |
+-------------+
2 rows
List all users for a role
An administrator is able to view all assigned users for a role.
Syntax:
CALL dbms.security.listUsersForRole(roleName)
Arguments:
Name
Type
Description
roleName
String
This is the name of the role.
Name
Type
Description
value
String
This returns all assigned users for the
requested role.
Returns:
Exceptions:
The current user is not an administrator.
The role name does not exist in the system.
76
Example 8. List all users for a role
The following example lists all the assigned users - 'bill' and 'anne' - for the role publisher.
CALL dbms.security.listUsersForRole('publisher')
+--------+
| value |
+--------+
| "bill" |
| "anne" |
+--------+
2 rows
Create a user
An administrator is able to create a new user. This action ought to be followed by assigning a role to
the user, which is described here.
Syntax:
CALL dbms.security.createUser(username, password, requirePasswordChange)
Arguments:
Name
Type
Description
username
String
This is the user’s username.
password
String
This is the user’s password.
requirePasswordChange
Boolean
This is optional, with a default of true.
If this is true, (i) the user will be forced
to change their password when they
log in for the first time, and (ii) until
the user has changed their password,
they will be forbidden from
performing any other operation.
Exceptions:
The current user is not an administrator.
The username contains characters other than alphanumeric characters and the ‘_’ character.
The username is already in use within the system.
The password is the empty string.
Example 9. Create a user
The following example creates a user with the username 'johnsmith' and password 'h6u4%kr'.
When the user 'johnsmith' logs in for the first time, he will be required to change his password.
CALL dbms.security.createUser('johnsmith', 'h6u4%kr')
77
Delete a user
An administrator is able to delete permanently a user from the system. It is not possible to undo this
action, so, if in any doubt, consider suspending the user instead.
Syntax:
CALL dbms.security.deleteUser(username)
Arguments:
Name
Type
Description
username
String
This is the username of the user to be
deleted.
Exceptions:
The current user is not an administrator.
The username does not exist in the system.
The username matches that of the current user (i.e. deleting the current user is not permitted).
Considerations:
• It is not necessary to remove any assigned roles from the user prior to deleting the user.
• Deleting a user will terminate with immediate effect all of the user’s sessions and roll back any
running transactions.
• As it is not possible for the current user to delete themselves, there will always be at least one
administrator in the system.
Example 10. Delete a user
The following example deletes a user with the username 'janebrown'.
CALL dbms.security.deleteUser('janebrown')
Assign a role to a user
An administrator is able to assign a role to any user in the system, thus allowing the user to perform a
series of actions upon the data.
Syntax:
CALL dbms.security.addRoleToUser(roleName, username)
Arguments:
Name
Type
Description
roleName
String
This is the name of the role to be
assigned to the user.
username
String
This is the username of the user who
is to be assigned the role.
Exceptions:
78
The current user is not an administrator.
The username does not exist in the system.
The username contains characters other than alphanumeric characters and the ‘_’ character.
The role name does not exist in the system.
The role name contains characters other than alphanumeric characters and the ‘_’ character.
Considerations:
• This is an idempotent procedure.
Example 11. Assign a role to a user
The following example assigns the role publisher to the user with username 'johnsmith'.
CALL dbms.security.addRoleToUser('publisher', 'johnsmith')
Remove a role from a user
An administrator is able to remove a role from any user in the system, thus preventing the user from
performing upon the data any actions prescribed by the role.
Syntax:
CALL dbms.security.removeRoleFromUser(roleName, username)
Arguments:
Name
Type
Description
roleName
String
This is the name of the role which is to
be removed from the user.
username
String
This is the username of the user from
which the role is to be removed.
Exceptions:
The current user is not an administrator.
The username does not exist in the system.
The role name does not exist in the system.
The username is that of the current user and the role is admin.
Considerations:
• If the username is that of the current user and the role name provided is admin, an error will be
thrown; i.e. the current user may not be demoted from being an administrator.
• As it is not possible for the current user to remove the admin role from themselves, there will
always be at least one administrator in the system.
• This is an idempotent procedure.
79
Example 12. Remove a role from a user
The following example removes the role publisher from the user with username 'johnsmith'.
CALL dbms.security.removeRoleFromUser('publisher', 'johnsmith')
Create a custom role
An administrator is able to create custom roles in the system.
Syntax:
CALL dbms.security.createRole(roleName)
Arguments:
Name
Type
Description
roleName
String
This is the name of the role to be
created.
Exceptions:
The current user is not an administrator.
The role name already exists in the system.
The role name is empty.
The role name contains characters other than alphanumeric characters and the ‘_’ character.
The role name matches one of the native roles: reader, publisher, architect, and admin.
Example 13. Create a new custom role
The following example creates a new custom role.
CALL dbms.security.createRole('operator')
Delete a custom role
An administrator is able to delete custom roles from the system. The native roles reader, publisher,
architect, and admin (see Native roles) cannot be deleted.
Syntax:
CALL dbms.security.deleteRole(roleName)
Arguments:
Name
Type
Description
roleName
String
This is the name of the role to be
deleted.
Exceptions:
80
The current user is not an administrator.
The role name does not exist in the system.
The role name matches one of the native roles: reader, publisher, architect, and admin.
Considerations:
• Any role assignments will be removed.
Example 14. Delete a custom role
The following example deletes the custom role 'operator' from the system.
CALL dbms.security.deleteRole('operator')
Suspend a user
An administrator is able to suspend a user from the system. The suspended user may be activated at
a later stage.
Syntax:
CALL dbms.security.suspendUser(username)
Arguments:
Name
Type
Description
username
String
This is the username of the user to be
suspended.
Exceptions:
The current user is not an administrator.
The username does not exist in the system.
The username matches that of the current user (i.e. suspending the current user is not permitted).
Considerations:
• Suspending a user will terminate with immediate effect all of the user’s sessions and roll back any
running transactions.
• All of the suspended user’s attributes — assigned roles and password — will remain intact.
• A suspended user will not be able to log on to the system.
• As it is not possible for the current user to suspend themselves, there will always be at least one
active administrator in the system.
• This is an idempotent procedure.
Example 15. Suspend a user
The following example suspends a user with the username 'billjones'.
CALL dbms.security.suspendUser('billjones')
81
Activate a user
An administrator is able to activate a suspended user so that the user is once again able to access the
data in their original capacity.
Syntax:
CALL dbms.security.activateUser(username, requirePasswordChange)
Arguments:
Name
Type
Description
username
String
This is the username of the user to be
activated.
requirePasswordChange
Boolean
This is optional, with a default of true.
If this is true, (i) the user will be forced
to change their password when they
next log in, and (ii) until the user has
changed their password, they will be
forbidden from performing any other
operation.
Exceptions:
The current user is not an administrator.
The username does not exist in the system.
The username matches that of the current user (i.e. activating the current user is not permitted).
Considerations:
• This is an idempotent procedure.
Example 16. Activate a user
The following example activates a user with the username 'jackgreen'. When the user 'jackgreen'
next logs in, he will be required to change his password.
CALL dbms.security.activateUser('jackgreen')
Change a user’s password
An administrator is able to change the password of any user within the system. Alternatively, the
current user may change their own password.
Syntax:
CALL dbms.security.changeUserPassword(username, newPassword, requirePasswordChange)
Arguments:
Name
Type
Description
username
String
This is the username of the user
whose password is to be changed.
newPassword
String
This is the new password for the user.
82
Name
Type
Description
requirePasswordChange
Boolean
This is optional, with a default of true.
If this is true, (i) the user will be forced
to change their password when they
next log in, and (ii) until the user has
changed their password, they will be
forbidden from performing any other
operation.
Exceptions:
The current user is not an administrator and the username does not match that of the current user.
The username does not exist in the system.
The password is the empty string.
The password is the same as the user’s previous password.
Considerations:
• This procedure may be invoked by the current user to change their own password, irrespective of
whether or not the current user is an administrator.
• This procedure may be invoked by an administrator to change another user’s password.
• In addition to changing the user’s password, this will terminate with immediate effect all of the
user’s sessions and roll back any running transactions.
Example 17. Change a user’s password
The following example changes the password of the user with the username 'joebloggs' to
'h6u4%kr'. When the user 'joebloggs' next logs in, he will be required to change his password.
CALL dbms.security.changeUserPassword('joebloggs', 'h6u4%kr')
Change the current user’s password
Any active user is able to change their own password at any time.
Syntax:
CALL dbms.security.changePassword(password, requirePasswordChange)
Arguments:
Name
Type
Description
password
String
This is the new password for the
current user.
requirePasswordChange
Boolean
This is optional, with a default of
false. If this is true, (i) the current user
will be forced to change their
password when they next log in, and
(ii) until the current user has changed
their password, they will be forbidden
from performing any other operation.
Exceptions:
The password is the empty string.
83
The password is the same as the current user’s previous password.
Example 18. Change the current user’s password
The following example changes the password of the current user to 'h6u4%kr'.
CALL dbms.security.changePassword('h6u4%kr')
List roles per procedure
An administrator is able to view all procedures in the system, including which role(s) have the privilege
to execute them.
Syntax:
CALL dbms.procedures()
Returns:
Name
Type
Description
name
String
This is the name of the procedure.
signature
String
This is the signature of the procedure.
description
String
This is a description of the procedure.
roles
List<String>
This is a list of roles having the
privilege to execute the procedure.
Exceptions:
The current user is not an administrator.
Example 19. List role per procedure
The following example shows, for four of the security procedures, the procedure name, the
description, and which roles have the privilege to execute the procedure.
CALL dbms.procedures()
YIELD name, signature, description, roles
WITH name, description, roles
WHERE name contains 'security'
RETURN name, description, roles
ORDER BY name
LIMIT 4
+----------------------------------------------------------------------------------------------+
| name
| description
| roles
|
+----------------------------------------------------------------------------------------------+
| "dbms.security.activateUser"
| "Activate a suspended user."
| ["admin"]
|
| "dbms.security.addRoleToUser"
| "Assign a role to the user."
| ["admin"]
|
| "dbms.security.changePassword"
| "Change the current user's password." | ["admin"]
|
| "dbms.security.changeUserPassword"
| "Change the given user's password."
| ["admin"]
|
+----------------------------------------------------------------------------------------------+
4 rows
84
7.1.5. Integration with LDAP
This chapter describes Neo4j support for integrating with LDAP systems.
Configure the LDAP auth provider
Neo4j supports the LDAP protocol which allows for integration with Active Directory, OpenLDAP or
other LDAP-compatible authentication services. We will show example configurations where
management of federated users is deferred to the LDAP service, using that service’s facilities for
administration. This means that we completely turn off native Neo4j user and role administration and
map LDAP groups to the four built-in Neo4j roles (reader, publisher, architect and admin) and to
custom roles.
All settings need to be defined at server startup time in the default configuration file neo4j.conf. First
configure Neo4j to use LDAP as authentication and authorization provider.
# Turn on security
dbms.security.auth_enabled=true
# Choose LDAP connector as security provider for both authentication and authorization
dbms.security.auth_provider=ldap
Configuration for Active Directory.
See below for an example configuration for Active Directory.
# Configure LDAP to point to the AD server
dbms.security.ldap.host=ldap://myactivedirectory.example.com
# Provide details on user structure within the LDAP system:
dbms.security.ldap.authentication.user_dn_template=cn={0},cn=Users,dc=example,dc=com
dbms.security.ldap.authorization.user_search_base=cn=Users,dc=example,dc=com
dbms.security.ldap.authorization.user_search_filter=(&(objectClass=*)(cn={0}))
dbms.security.ldap.authorization.group_membership_attributes=memberOf
# Configure the actual mapping between groups in the LDAP system and roles in Neo4j
dbms.security.ldap.authorization.group_to_role_mapping=\
"cn=Neo4j Read Only,cn=Users,dc=neo4j,dc=com"
= reader
;\
"cn=Neo4j Read-Write,cn=Users,dc=neo4j,dc=com"
= publisher
;\
"cn=Neo4j Schema Manager,cn=Users,dc=neo4j,dc=com" = architect
;\
"cn=Neo4j Administrator,cn=Users,dc=neo4j,dc=com" = admin
;\
"cn=Neo4j Procedures,cn=Users,dc=neo4j,dc=com"
= allowed_role
# In case defined users are not allowed to search for themselves,
# we can specify credentials for a user with read access to all users and groups:
# Note that this account only needs read-only access to the relevant parts of the LDAP directory
# and does not need to have access rights to Neo4j or any other systems.
#dbms.security.ldap.authorization.use_system_account=true
#dbms.security.ldap.authorization.system_username=cn=search-account,cn=Users,dc=example,dc=com
#dbms.security.ldap.authorization.system_password=secret
Configuration for openLDAP
See below for an example configuration for openLDAP.
85
# Configure LDAP to point to the OpenLDAP server
dbms.security.ldap.host=myopenldap.example.com
# Provide details on user structure within the LDAP system:
dbms.security.ldap.authentication.user_dn_template=cn={0},ou=users,dc=example,dc=com
dbms.security.ldap.authorization.user_search_base=ou=users,dc=example,dc=com
dbms.security.ldap.authorization.user_search_filter=(&(objectClass=*)(uid={0}))
dbms.security.ldap.authorization.group_membership_attributes=gidnumber
# Configure the actual mapping between groups in the OpenLDAP system and roles in Neo4j
dbms.security.ldap.authorization.group_to_role_mapping=\
101 = reader
;\
102 = publisher
;\
103 = architect
;\
104 = admin
;\
105 = allowed_role
# In case defined users are not allowed to search for themselves,
# we can specify credentials for a user with read access to all users and groups:
# Note that this account only needs read-only access to the relevant parts of the LDAP directory
# and does not need to have access rights to Neo4j or any other systems.
#dbms.security.ldap.authorization.use_system_account=true
#dbms.security.ldap.authorization.system_username=cn=search-account,ou=users,dc=example,dc=com
#dbms.security.ldap.authorization.system_password=search-account-password
We would like to draw attention to some some of the details in the configuration examples. A
comprehensive overview of LDAP configuration options is available in Configuration settings.
Parameter name
Default value
Description
dbms.security.ldap.authentication.use uid={0},ou=users,dc=example,dc=com
r_dn_template
Converts usernames into LDAPspecific fully qualified names required
for logging in.
dbms.security.ldap.authorization.user ou=users,dc=example,dc=com
_search_base
Sets the base object or named context
to search for user objects.
dbms.security.ldap.authorization.user (&(objectClass=*)(uid={0}))
_search_filter
Sets up an LDAP search filter to search
for a user principal.
dbms.security.ldap.authorization.grou [memberOf]
p_membership_attributes
Lists attribute names on a user object
that contains groups to be used for
mapping to roles.
dbms.security.ldap.authorization.grou
p_to_role_mapping
Lists an authorization mapping from
groups to the pre-defined built-in roles
admin, architect, publisher and reader,
or to any other custom-defined roles.
Use 'ldapsearch' to verify the configuration
We can use the LDAP command-line tool ldapsearch to verify that the configuration is correct, and that
the LDAP server is actually responding. We do this by issuing a search command that includes LDAP
configuration setting values.
These example searches verify both the authentication (using the simple mechanism) and
authorization of user 'john'. See the ldapsearch documentation for more advanced usage and how to
use SASL authentication mechanisms.
With dbms.security.ldap.authorization.use_system_account=false (default):
#ldapsearch -v -H ldap://<dbms.security.ldap.host> -x -D
<dbms.security.ldap.authentication.user_dn_template : replace {0}> -W -b
<dbms.security.ldap.authorization.user_search_base> "<dbms.security.ldap.authorization.user_search_filter
: replace {0}>" <dbms.security.ldap.authorization.group_membership_attributes>
ldapsearch -v -H ldap://myactivedirectory.example.com:389 -x -D cn=john,cn=Users,dc=example,dc=com -W -b
cn=Users,dc=example,dc=com "(&(objectClass=*)(cn=john))" memberOf
86
With dbms.security.ldap.authorization.use_system_account=true:
#ldapsearch -v -H ldap://<dbms.security.ldap.host> -x -D
<dbms.security.ldap.authorization.system_username> -w <dbms.security.ldap.authorization.system_password>
-b <dbms.security.ldap.authorization.user_search_base>
"<dbms.security.ldap.authorization.user_search_filter>"
<dbms.security.ldap.authorization.group_membership_attributes>
ldapsearch -v -H ldap://myactivedirectory.example.com:389 -x -D cn=searchaccount,cn=Users,dc=example,dc=com -w secret -b cn=Users,dc=example,dc=com "(&(objectClass=*)(cn=john))"
memberOf
Then verify that we get a successful response, and that the value of the returned membership
attribute is a group that is mapped to a role in
dbms.security.ldap.authorization.group_to_role_mapping.
#
#
#
#
#
#
#
extended LDIF
LDAPv3
base <cn=Users,dc=example,dc=com> with scope subtree
filter: (cn=john)
requesting: memberOf
# john, Users, example.com
dn: CN=john,CN=Users,DC=example,DC=com
memberOf: CN=Neo4j Read Only,CN=Users,DC=example,DC=com
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
The auth cache
The auth cache is the mechanism by which Neo4j caches the result of authentication via the LDAP
server in order to aid performance. It is configured with the
dbms.security.ldap.authentication.cache_enabled and dbms.security.auth_cache_ttl parameters.
# Turn on authentication caching to ensure performance
dbms.security.ldap.authentication.cache_enabled=true
dbms.security.auth_cache_ttl=10m
Parameter name
Default value
Description
dbms.security.ldap.authentication.cac
he_enabled
true
Determines whether or not to cache
the result of authentication via the
LDAP server. Whether authentication
caching should be enabled or not
must be considered in view of your
company’s security guidelines. It
should be noted that when using the
REST API, disabling authentication
caching will result in re-authentication
and possibly re-authorization of users
on every request, which may severely
impact performance on production
systems, and put heavy load on the
LDAP server.
87
Parameter name
Default value
Description
dbms.security.auth_cache_ttl
10000 minutes
Is the time to live (TTL) for cached
authentication and authorization info.
Setting the TTL to 0 will disable all auth
caching. A short ttl will require more
frequent re-authentication and reauthorization, which can impact
performance. A very long ttl will also
mean that changes to the users
settings on an LDAP server may not be
reflected in the Neo4j authorization
behaviour in a timely manner.
An administrator can clear the auth cache to force the re-querying of authentication and authorization
information from the federated auth provider system.
Example 20. Clear the auth cache
Use the Neo4j Browser or the Neo4j Cypher Shell to execute this statement.
CALL dbms.security.clearAuthCache()
Available methods of encryption
All the following ways of specifying the dbms.security.ldap.host parameter are valid. Doing so will
configure using LDAP without encryption. Not specifying the protocol or port will result in ldap being
used over the default port 389.
dbms.security.ldap.host=myactivedirectory.example.com
dbms.security.ldap.host=myactivedirectory.example.com:389
dbms.security.ldap.host=ldap://myactivedirectory.example.com
dbms.security.ldap.host=ldap://myactivedirectory.example.com:389
Use LDAP with encryption via StartTLS
To configure Active Directory with encryption via StartTLS, set the following parameters:
dbms.security.ldap.use_starttls=true
dbms.security.ldap.host=ldap://myactivedirectory.example.com
Use LDAP with encrypted LDAPS
To configure Active Directory with encrypted LDAPS, set dbms.security.ldap.host to one of the
following. Not specifying the port will result in ldaps being used over the default port 636.
dbms.security.ldap.host=ldaps://myactivedirectory.example.com
dbms.security.ldap.host=ldaps://myactivedirectory.example.com:636
This method of securing Active Directory is being deprecated and is therefore not recommended.
Instead, use Active Directory with encryption via StartTLS.
Use a self-signed certificate in a test environment
Production environments should always use an SSL certificate issued by a Certificate Authority for
secure access to the LDAP server. However there are scenarios, for example in test environments,
88
where you may wish to use a self-signed certificate on the LDAP server. In these scenarios you will
have to tell Neo4j about the local certificate. This is done by entering the details of the certificate using
dbms.jvm.additional in neo4j.conf.
Example 21. Specify details for self-signed certificate on LDAP server
dbms.jvm.additional=-Djavax.net.ssl.keyStore=MyCert.jks
dbms.jvm.additional=-Djavax.net.ssl.keyStorePassword=secret
dbms.jvm.additional=-Djavax.net.ssl.trustStore=MyCert.jks
dbms.jvm.additional=-Djavax.net.ssl.trustStorePassword=secret
7.1.6. Subgraph access control
This chapter describes how to configure subgraph access control.
Through the use of user-defined procedures and custom roles, an administrator may restrict a user’s
access and subsequent actions to specified portions of the graph. In other words, it is possible to
configure access control at the level of a subgraph. For example, a user can be allowed to read, but
not write, nodes labelled with Employee and relationships of type REPORTS_TO.
The following sections describe the actions required to configure subgraph access control. The actions
can be undertaken in any order.
The chapter describes the following:
• Manage the custom role
• Native users scenario
• Federated users scenario (LDAP)
• Configure procedure permissions
Manage the custom role
Native users scenario
Create the custom role, and, subsequently, assign the role to the relevant user(s).
Example 22. Create an 'accounting' role and assign it to a pre-existing user, 'billsmith'.
CALL dbms.security.createRole('accounting')
CALL dbms.security.addRoleToUser('accounting', 'billsmith')
Federated users scenario (LDAP)
In the LDAP scenario, the LDAP user group must be mapped to the custom role in Neo4j.
Example 23. Map the LDAP group with groupId '101' to the custom role 'accounting':
dbms.security.realms.ldap.authorization.group_to_role_mapping=101=accounting
89
Configure procedure permissions
The procedure to read or write a portion of the data needs to be created, unless it is already available
as an in-house or third-party library. Refer to Neo4j Developer Manual → User-defined procedures
(http://neo4j.com/docs/developer-manual/3.1/extending-neo4j/#procedures/) for a thorough description on
creating and using user-defined procedures.
In standard use, procedures will be executed according to the same security rules as normal Cypher
statements. For example, a procedure with mode=WRITE will be able to be executed by users assigned to
any one of the roles publisher, architect and admin, whereas a user assigned only to the reader role
will not be allowed to execute the procedure.
The standard mode of usage can be overridden with the configuration options
dbms.security.procedures.default_allowed and dbms.security.procedures.roles. These options allow
specific roles to execute procedures they would otherwise be prevented from accessing.
Setting dbms.security.procedures.default_allowed allows a role to execute any procedure that is not
matched by the dbms.security.procedures.roles configuration.
The dbms.security.procedures.roles setting provides more fine-grained control over procedures. For
example, setting
dbms.security.procedures.roles=apoc.convert.*:Converter;apoc.load.json.*:Converter,DataSource;a
poc.trigger.add:TriggerHappy will have the following effects:
• All users with the role Converter will be able to execute all procedures in the apoc.convert
namespace.
• The roles Converter and DataSource will be able to execute procedures in the apoc.load.json
namespace.
• The role TriggerHappy will be able to execute the specific procedure apoc.trigger.add.
The procedure’s role configuration is used to override the permissions given by the user’s roles. This
will override the permission of the user with the mode of the procedure during the execution of the
procedure. As a consequence, if the procedure attempts to execute database operations that are not
included in its mode, it will fail with a 'permission denied' error regardless of the reason as to why the
user was permitted to run the procedure.
7.2. Security checklist
This chapter provides a summary of recommendations regarding security in Neo4j.
Below is a simple checklist highlighting the specific areas within Neo4j that may need some extra
attention in order to ensure the appropriate level of security for your application.
1. Deploy Neo4j on safe servers in safe networks:
a. Use subnets and firewalls.
b. Only open up for the necessary ports. For a list of relevant ports see Ports.
2. Protect data-at-rest:
a. Use volume encryption (e.g. Bitlocker).
b. Manage access to database dumps (refer to Dump and load databases) and backups (refer to
Perform a backup). In particular, ensure that there is no external access to the port specified
by the setting dbms.backup.address (this defaults to 6362). Failing to protect this port leaves a
security hole open by which an unauthorized user can make a copy of the database onto a
different machine.
90
c. Manage access to data files and transaction logs. Prohibit all operating system access to Neo4j
files except as instructed in Permissions.
3. Protect data-in-transit:
a. For remote access to the Neo4j database, only open up for encrypted Bolt or HTTPS.
b. Use SSL certificates issued from a trusted Certificate Authority.
i. For installing SSL certificates in Neo4j refer to Install certificates.
ii. For configuring your Bolt and/or HTTPS connectors, refer to Configure Neo4j connectors.
iii. If using LDAP, configure your LDAP system with encryption via StartTLS; see Use LDAP with
encryption via StartTLS.
4. Validate any custom code that you deploy (procedures and unmanaged extensions) and ensure
that they do not expose any parts of the product or data unintentionally.
5. Ensure the correct file permissions on the Neo4j files. Only the operating system user that Neo4j
runs as should have permissions to those files. Refer to Permissions for instructions on
permission levels. In particular, protect data files, transaction logs and database dumps from
unauthorized read access. Protect against the execution of unauthorized extensions by restricting
access to the Bin, Lib, and Plugins directories.
6. If LOAD CSV is enabled, ensure that it does not allow unauthorized users to import data. How to
configure LOAD CSV is described in Developer Manual → LOAD CSV (http://neo4j.com/docs/developermanual/3.1/cypher/clause/load-csv/#query-load-csv).
7. Do not turn off Neo4j authentication. Refer to Enabling authentication and authorization for
details on this setting.
8. Survey your neo4j.conf file (see File locations) for ports relating to deprecated functions (such as
neo4j-shell, controlled by the parameter dbms.shell.port) and remote JMX (controlled by the
parameter setting dbms.jvm.additional=-Dcom.sun.management.jmxremote.port=3637).
9. Use the latest patch version of Neo4j.
91
Chapter 8. Monitoring
This chapter describes available tools for monitoring Neo4j.
Neo4j provides mechanisms for continuous analysis through the output of metrics as well as the
inspection and management of currently-executing queries.
Logs can be harvested for continuous analysis, or for specific investigations. Facilities are available for
producing security event logs as well as query logs. The query management functionality is provided
for specific investigations into query performance. Monitoring features are also provided for ad-hoc
analysis of a Causal Cluster.
The chapter describes the following:
• Metrics
• Enable metrics logging
• Metrics reference
• Logging
• Security events logging
• Query logging
• Query management
• Query management procedures
• Transaction timeout
• Monitoring of a Causal Cluster
8.1. Metrics
This chapter describes how to use Neo4j metrics output facilities to log and display various
metrics.

The features described in this section are available in Neo4j Enterprise Edition.
Neo4j can be configured to report metrics in two different ways:
• Export metrics to CSV files.
• Send metrics to Graphite or any monitoring tool based on the Graphite protocol.
8.1.1. Enable metrics logging
Neo4j can expose metrics for the following parts of the database:
92
# Setting for enabling all supported metrics.
metrics.enabled=true
# Setting for enabling all Neo4j specific metrics.
metrics.neo4j.enabled=true
# Setting for exposing metrics about transactions; number of transactions started, committed, etc.
metrics.neo4j.tx.enabled=true
# Setting for exposing metrics about the Neo4j page cache; page faults, evictions, flushes and exceptions,
etc.
metrics.neo4j.pagecache.enabled=true
# Setting for exposing metrics about approximately entities are in the database; nodes, relationships,
properties, etc.
metrics.neo4j.counts.enabled=true
# Setting for exposing metrics about the network usage of the HA cluster component.
metrics.neo4j.network.enabled=true
Graphite
Add the following settings to neo4j.conf in order to enable integration with Graphite:
# Enable the Graphite integration. Default is 'false'.
metrics.graphite.enabled=true
# The IP and port of the Graphite server on the format <hostname or IP address>:<port number>.
# The default port number for Graphite is 2003.
metrics.graphite.server=localhost:2003
# How often to send data. Default is 3 minutes.
metrics.graphite.interval=3m
# Prefix for Neo4j metrics on Graphite server.
metrics.prefix=Neo4j_1
Start Neo4j and connect to Graphite via a web browser in order to monitor your Neo4j metrics.
CSV files
Add the following settings to neo4j.conf in order to enable export of metrics into local .CSV files:
# Enable the CSV exporter. Default is 'false'.
metrics.csv.enabled=true
# Directory path for output files.
# Default is a "metrics" directory under NEO4J_HOME.
#metrics.csv.path='/local/file/system/path'
# How often to store data. Default is 3 minutes.
metrics.csv.interval=3m

The CSV exporter does not automatically rotate the output files. When enabling the
CSV exporter, it is recommended to configure a job to periodically archive the files.
8.1.2. Available metrics
General-purpose metrics
Table 7. Database Checkpointing Metrics
Name
Description
neo4j.check_point.eve The total number of check point events executed so far
nts
neo4j.check_point.tot The total time spent in check pointing so far
al_time
93
Name
Description
neo4j.check_point.che The duration of the check point event
ck_point_duration
Table 8. Database Data Metrics
Name
Description
neo4j.ids_in_use.rela The total number of different relationship types stored in the database
tionship_type
neo4j.ids_in_use.prop The total number of different property names used in the database
erty
neo4j.ids_in_use.rela The total number of relationships stored in the database
tionship
neo4j.ids_in_use.node The total number of nodes stored in the database
Table 9. Database PageCache Metrics
Name
Description
neo4j.page_cache.evic The total number of exceptions seen during the eviction process in the page cache
tion_exceptions
neo4j.page_cache.flus The total number of flushes executed by the page cache
hes
neo4j.page_cache.unpi The total number of page unpins executed by the page cache
ns
neo4j.page_cache.pins The total number of page pins executed by the page cache
neo4j.page_cache.evic The total number of page evictions executed by the page cache
tions
neo4j.page_cache.page The total number of page faults happened in the page cache
_faults
Table 10. Database Transaction Metrics
Name
Description
neo4j.transaction.sta The total number of started transactions
rted
neo4j.transaction.pea The highest peak of concurrent transactions ever seen on this machine
k_concurrent
neo4j.transaction.act The number of currently active transactions
ive
neo4j.transaction.act The number of currently active read transactions
ive_read
neo4j.transaction.act The number of currently active write transactions
ive_write
neo4j.transaction.com The total number of committed transactions
mitted
neo4j.transaction.com The total number of committed read transactions
mitted_read
neo4j.transaction.com The total number of committed write transactions
mitted_write
neo4j.transaction.rol The total number of rolled back transactions
lbacks
neo4j.transaction.rol The total number of rolled back read transactions
lbacks_read
neo4j.transaction.rol The total number of rolled back write transactions
lbacks_write
94
Name
Description
neo4j.transaction.ter The total number of terminated transactions
minated
neo4j.transaction.ter The total number of terminated read transactions
minated_read
neo4j.transaction.ter The total number of terminated write transactions
minated_write
neo4j.transaction.las The ID of the last committed transaction
t_committed_tx_id
neo4j.transaction.las The ID of the last closed transaction
t_closed_tx_id
Table 11. Cypher Metrics
Name
Description
neo4j.cypher.replan_e The total number of times Cypher has decided to re-plan a query
vents
Table 12. Database LogRotation Metrics
Name
Description
neo4j.log_rotation.ev The total number of transaction log rotations executed so far
ents
neo4j.log_rotation.to The total time spent in rotating transaction logs so far
tal_time
neo4j.log_rotation.lo The duration of the log rotation event
g_rotation_duration
Table 13. Network Metrics
Name
Description
neo4j.network.slave_n The amount of bytes transmitted on the network containing the transaction data from a slave
to the master in order to be committed
etwork_tx_writes
neo4j.network.master_ The amount of bytes transmitted on the network while copying stores from a machines to
network_store_writes another
neo4j.network.master_ The amount of bytes transmitted on the network containing the transaction data from a
master to the slaves in order to propagate committed transactions
network_tx_writes
Table 14. Cluster Metrics
Name
Description
neo4j.cluster.slave_p The total number of update pulls executed by this instance
ull_updates
neo4j.cluster.slave_p The highest transaction id that has been pulled in the last pull updates by this instance
ull_update_up_to_tx
neo4j.cluster.is_mast Whether or not this instance is the master in the cluster
er
neo4j.cluster.is_avai Whether or not this instance is available in the cluster
lable
Table 15. Core Metrics
Name
Description
neo4j.causal_clusteri Append index of the RAFT log
ng.core.append_index
neo4j.causal_clusteri Commit index of the RAFT log
ng.core.commit_index
95
Name
Description
neo4j.causal_clusteri RAFT Term of this server
ng.core.term
neo4j.causal_clusteri Leader was not found while attempting to commit a transaction
ng.core.leader_not_fo
und
neo4j.causal_clusteri TX pull requests received from read replicas
ng.core.tx_pull_reque
sts_received
neo4j.causal_clusteri Transaction retries
ng.core.tx_retries
neo4j.causal_clusteri Is this server the leader?
ng.core.is_leader
neo4j.causal_clusteri How many RAFT messages were dropped?
ng.core.dropped_messa
ges
neo4j.causal_clusteri How many RAFT messages are queued up?
ng.core.queue_sizes
Java Virtual Machine Metrics
These metrics are environment dependent and they may vary on different hardware and with JVM
configurations. Typically these metrics will show information about garbage collections (for example
the number of events and time spent collecting), memory pools and buffers, and finally the number of
active threads running.
Metrics specific to Causal Clustering
The Core and Read replica roles have widely varying metrics as befits their different characteristics
and supported protocols. The Core metrics monitor important details like the collective state of the
Raft distributed consensus protocol and the number of transactions that have been shipped to Read
replicas. Read replica metrics are far simpler, simply tracking the asynchronous replication state with
respect to the Core servers.
Core
Core servers track a wide range of metrics pertaining to the Raft distributed consensus algorithm. The
also track their load (in terms of transactions log-shipping requests) to the Read replicas (and any
newly brought online Core servers).
Table 16. Core metrics
Name
Description
neo4j.causal_clustering.core.commit_index
This server’s Raft commit index showing how many
transactions it has safely committed into its Raft log.
neo4j.causal_clustering.core.append_index
This server’s Raft append index showing how many
transactions it has appended (but not necessarily
committed) into its Raft log.
neo4j.causal_clustering.core.term
This server’s Raft term showing this servers view of the
number of Leader elections that have occurred.
neo4j.causal_clustering.core.leader_not_found
The number of times this server could not locate a Leader
for the Raft protocol.
neo4j.causal_clustering.core.tx_pull_requests_received The number of transaction log-shipping requests that have
been received by the current server.
neo4j.causal_clustering.core.tx_retries
The number of transactions that have had to be retried by
the current server.
96
Name
Description
neo4j.causal_clustering.core.is_leader
Whether or not the current server is playing the Raft
Leader role.
neo4j.causal_clustering.core.dropped_messages
The number of messages dropped by the current server
owning to communication failures with other Core servers.
neo4j.causal_clustering.core.queue_sizes
The aggregate queue size of Raft messages outbound to
other Core servers.
Read replica
Read replicas' metrics track the replication window with respect to the Core servers.
Table 17. Core metrics
Name
Description
pull_updates
The number of requests for asynchronous transaction
updates this server has made.
pull_update_highest_tx_id_requested
The highest transaction ID that this server has received
from the Core servers.
pull_update_highest_tx_id_received
The last transaction ID that this server has received from
the Core servers.
8.2. Logging
This chapter describes security and query logging in Neo4j.

The features described in this section are available in Neo4j Enterprise Edition.
Neo4j provides two types of logs for inspection of queries that are run in the database, and of security
events that have occurred.
The chapter describes the following:
• Query logging
• Security events logging
8.2.1. Query logging
This chapter describes Neo4j support for query logging.
Neo4j can be configured to log queries executed in the database.
Query logging must be enabled by setting the dbms.logs.query.enabled parameter to true. The
parameter dbms.logs.query.threshold determines the threshold for logging a query, i.e. if the
execution of a query takes a longer time than this threshold, it will be logged. Setting
dbms.logs.query.threshold to 0 will result in all queries being logged.
Log configuration
The name of the log file is query.log and it resides in the Logs directory (see File locations).
Rotation of the query log can be configured in the neo4j.conf configuration file. The following
parameters are available:
97
Parameter name
Default value
Description
dbms.logs.query.enabled
false
Log executed queries that take longer
than the configured threshold,
dbms.logs.query.threshold.
dbms.logs.query.parameter_logging_en true
abled
Log parameters for executed queries
that take longer than the configured
threshold.
dbms.logs.query.rotation.keep_number 7
Sets number of historical log files kept.
dbms.logs.query.rotation.size
20M
Sets the file size at which the query log
will auto-rotate.
dbms.logs.query.threshold
0
If the execution of query takes a
longer time than this threshold, the
query is logged (provided query
logging is enabled).
Below is an example of the query log:
2016-10-27 14:31 ... INFO 0 ms:
client/127.0.0.1:59167 ...
2016-10-27 14:31 ... INFO 9 ms:
client/127.0.0.1:59167 ...
2016-10-27 14:31 ... INFO 0 ms:
client/127.0.0.1:59167 ...
2016-10-27 14:32 ... INFO 3 ms:
dbms.procedures() - {}
2016-10-27 14:32 ... INFO 1 ms:
dbms.security.showCurrentUs...
2016-10-27 14:32 ... INFO 0 ms:
client/127.0.0.1:59167 ...
2016-10-27 14:32 ... INFO 0 ms:
client/127.0.0.1:59167 ...
2016-10-27 14:32 ... INFO 2 ms:
client/127.0.0.1:59261 ...
bolt-session
bolt
johndoe neo4j-javascript/0.0.0-dev
bolt-session
bolt
johndoe neo4j-javascript/0.0.0-dev
bolt-session
bolt
johndoe neo4j-javascript/0.0.0-dev
server-session http
127.0.0.1
/db/data/cypher neo4j - CALL
server-session http
127.0.0.1
/db/data/cypher neo4j - CALL
bolt-session
bolt
johndoe neo4j-javascript/0.0.0-dev
bolt-session
bolt
johndoe neo4j-javascript/0.0.0-dev
bolt-session
bolt
johndoe neo4j-javascript/0.0.0-dev
8.2.2. Security events logging
This chapter describes Neo4j support for security events logging.
Neo4j provides security event logging that records all security events.
For native user management, the following actions are recorded:
• Login attempts - per default both successful and unsuccessful logins are recorded.
• Change of password for user, by administrator and by the user themselves.
• Creation and deletion of users, including failed attempts.
• Creation and deletion of custom roles, including failed attempts.
• Assignment and removal of roles for users, including failed attempts.
• Suspension and activation of users.
• Failed attempts by non-admin users to list users and roles.
Log configuration
The name of the log file is security.log and it resides in the Logs directory (see File locations).
Rotation of the security events log can be configured in the neo4j.conf configuration file. The following
parameters are available:
98
Parameter name
Default value
Description
dbms.logs.security.rotation.size
20M
Sets the file size at which the security
event log will auto-rotate.
dbms.logs.security.rotation.delay
300s
Sets the minimum time interval after
the last log rotation occurred, before
the log may be rotated again.
dbms.logs.security.rotation.keep_num 7
ber
Sets number of historical log files kept.
If using LDAP as the authentication method, some cases of LDAP misconfiguration will also be logged,
as well as LDAP server communication events and failures.
If many programmatic interactions are expected, for example using REST, it is advised to disable the
logging of successful logins. Logging of successful logins is disabled by setting the
dbms.security.log_successful_authentication parameter in the neo4j.conf file:
dbms.security.log_successful_authentication=false
Below is an example of the security log:
2016-10-27 13:45:00.796+0000 INFO
2016-10-27 13:47:53.443+0000 ERROR
principal or credentials
2016-10-27 13:48:28.566+0000 INFO
2016-10-27 13:48:32.753+0000 ERROR
`janedoe`: The specified user ...
2016-10-27 13:49:11.880+0000 INFO
`janedoe`
2016-10-27 13:49:34.979+0000 INFO
2016-10-27 13:49:37.053+0000 ERROR
`janedoe`: User 'janedoe' does ...
2016-10-27 14:00:02.050+0000 INFO
[AsyncLog @ 2016-10-27 ...]
[AsyncLog @ 2016-10-27 ...]
[johnsmith]: logged in
[johndoe]: failed to log in: invalid
[AsyncLog @ 2016-10-27 ...]
[AsyncLog @ 2016-10-27 ...]
[johnsmith]: created user `janedoe`
[johnsmith]: tried to create user
[AsyncLog @ 2016-10-27 ...]
[johnsmith]: added role `admin` to user
[AsyncLog @ 2016-10-27 ...]
[AsyncLog @ 2016-10-27 ...]
[johnsmith]: deleted user `janedoe`
[johnsmith]: tried to delete user
[AsyncLog @ 2016-10-27 ...]
[johnsmith]: created role `operator`
8.3. Query management
This section describes tools available for controlling the execution of queries.
There may be occasions when there is a need to inspect queries, either from a security or a
performance point of view. Neo4j provides various means for inspecting and managing queries.
The query log is available for continuous monitoring and for troubleshooting. The transaction timeout
feature is a safety measure by which an operator can define a maximum running time for queries. The
query management procedures allows for inspecting, and possibly killing, queries while they run in
the database.
The chapter describes the following:
• Transaction timeout
• Procedures for query management
• Terminology
• List all running queries
• Terminate multiple queries
• Terminate a single query
99
8.3.1. Transaction timeout
The execution guard is a feature that terminates transactions whose execution time has exceeded the
configured timeout.
To enable the execution guard, set dbms.transaction.timeout to some positive time interval value
denoting the default transaction timeout.
Example 24. Configure execution guard
Set the timeout to ten seconds.
dbms.transaction.timeout=10s
Setting dbms.transaction.timeout to 0 — which is the default value — disables the execution guard.
This feature will have no effect on transactions executed with custom timeouts (via the Java API), as a
custom timeout will override the value set for dbms.transaction.timeout.
8.3.2. Procedures for query management
This section describes the procedures available for viewing and terminating currentlyexecuting queries.

The features described in this section are available in Neo4j Enterprise Edition.
Unless stated otherwise, all arguments to the procedures described in this section must be supplied.
Terminology
administrator
This is a user who has been assigned the admin role. Refer to Native user and role management for
managing users and roles.
current user
This is the currently logged-in user invoking the commands described in this chapter.
user
• A user is composed of a username and credentials, where the latter is a unit of information,
such as a password, verifying the identity of a user.
• A user may represent a human, an application etc.
List all running queries
An administrator is able to view all queries that are currently executing within the instance.
Alternatively, the current user may view all of their own currently-executing queries.
Syntax:
CALL dbms.listQueries()
Returns:
100
Name
Type
Description
queryId
String
This is the ID of the query.
username
String
This is the username of the user who
is executing the query.
query
String
This is the query itself.
parameters
Map
This is a map containing all the
parameters used by the query.
startTime
String
This is the time at which the query was
started.
elapsedTime
String
This is the time that has elapsed since
the query was started.
connectionDetails
String
These are the connection details
pertaining to the query.
metaData
Map
This is any metadata associated with
the transaction.
Example 25. Viewing queries that are currently executing
The following example shows that the user 'alwood' is currently running two queries; namely, the
dbms.listQueries() procedure and a query creating many nodes. Some of the data has been
trimmed for brevity.
CALL dbms.listQueries()
+------------------------------------------------------------------------------------------------------------------------+
| queryId
| username | query
| parameters | startTime | elapsedTime |
connectionDetails | metaData |
+------------------------------------------------------------------------------------------------------------------------+
| "query-271" | "alwood" | "... dbms.listQueries()" | (empty)
| "2016..." | "00:00:..." | "serversess ..." | (empty) |
| "query-272" | "alwood" | "WITH range(1, 3000...)" | (empty)
| "2016..." | "00:00:..." | "serversess ..." | (empty) |
+------------------------------------------------------------------------------------------------------------------------+
2 rows
Terminate multiple queries
An administrator is able to terminate within the instance all transactions executing a query with any of
the given query IDs. Alternatively, the current user may terminate all of their own transactions
executing a query with any of the given query IDs.
Syntax:
CALL dbms.killQueries(ids)
Arguments:
Name
Type
Description
ids
List<String>
This is a list of the IDs of all the
queries to be terminated.
Returns:
101
Name
Type
Description
queryId
String
This is the ID of the terminated query.
username
String
This is the username of the user who
was executing the (now terminated)
query.
Example 26. Terminating multiple queries
The following example shows that the administrator has terminated the queries with IDs 'query378' and 'query-765', started by the users 'joesmith' and 'annebrown', respectively.
CALL dbms.killQueries(['query-378','query-765'])
+---------------------------+
| queryId
| username
|
+---------------------------+
| "query-378" | "joesmith" |
| "query-765" | "annebrown" |
+---------------------------+
2 rows
Terminate a single query
An administrator is able to terminate within the instance any transaction executing the query whose
ID is provided. Alternatively, the current user may terminate their own transaction executing the query
whose ID is provided.
Syntax:
CALL dbms.killQuery(id)
Arguments:
Name
Type
Description
id
String
This is the ID of the query to be
terminated.
Name
Type
Description
queryId
String
This is the ID of the terminated query.
username
String
This is the username of the user who
was executing the (now terminated)
query.
Returns:
102
Example 27. Terminating a single query
The following example shows that the user 'joesmith' has terminated his query with the ID
'query-502'.
CALL dbms.killQuery('query-502')
+---------------------------+
| queryId
| username
|
+---------------------------+
| "query-502" | "joesmith" |
+---------------------------+
1 row
8.4. Procedures for monitoring a Causal Cluster
This section covers additional facilities available for monitoring a Neo4j Causal Cluster.
In addition to specific metrics as described in previous sections, Neo4j Causal Clusters provide an
infrastructure that operators will wish to monitor as well as new affordances for observing the state of
the overall cluster. Together these procedures can be used to inspect the cluster state and to
understand its current condition and topology.
The chapter describes the following:
• Find out the role of a cluster member
• Gain an overview over the instances in the cluster
• Get routing recommendations
8.4.1. Find out the role of a cluster member
The procedure dbms.cluster.role() can be called on every instance in a Causal Cluster to return the
role of the instance.
Syntax:
CALL dbms.cluster.role()
Returns:
Name
Type
Description
role
String
This is the role of the current instance,
which can be LEADER, FOLLOWER, or
READ_REPLICA.
Considerations:
• While this procedure is useful in and of itself, it serves as basis for more powerful monitoring
procedures.
103
Example 28. Check the role of this instance
The following example shows how to find out the role of the current instance, which in this case
is 'FOLLOWER'.
CALL dbms.cluster.role()
role
FOLLOWER
8.4.2. Gain an overview over the instances in the cluster
The procedure dbms.cluster.overview() provides an overview of cluster topology by returning details
on all the instances in the cluster.
Syntax:
CALL dbms.cluster.overview()
Returns:
Name
Type
Description
id
String
This is id of the instance.
addresses
List<String>
This is a list of all the addresses for the
instance.
role
String
This is the role of the instance, which
can be LEADER, FOLLOWER, or
READ_REPLICA.
Considerations:
• This procedure can only be called from Core instances, since they are the only ones that have the
full view of the cluster.
104
Example 29. Get an overview of the cluster
The following example shows how to explore the cluster topology.
CALL dbms.cluster.overview()
id
addresses
role
08eb9305-53b9-4394-92370f0d63bb05d5
[bolt://neo20:7687,
http://neo20:7474,
https://neo20:7473]
LEADER
cb0c729d-233c-452f-8f06f2553e08f149
[bolt://neo21:7687,
http://neo21:7474,
https://neo21:7473]
FOLLOWER
ded9eed2-dd3a-4574-bc086a569f91ec5c
[bolt://neo22:7687,
http://neo22:7474,
https://neo22:7473]
FOLLOWER
00000000-0000-0000-0000000000000000
[bolt://neo34:7687,
http://neo34:7474,
https://neo34:7473]
READ_REPLICA
00000000-0000-0000-0000000000000000
[bolt://neo28:7687,
http://neo28:7474,
https://neo28:7473]
READ_REPLICA
00000000-0000-0000-0000000000000000
[bolt://neo31:7687,
http://neo31:7474,
https://neo31:7473]
READ_REPLICA
8.4.3. Get routing recommendations
From the application point of view it is not interesting to know about the role a member plays in the
cluster. Instead the application needs to know which instance can provide the wanted service. The
procedure dbms.cluster.routing.getServers() provides this information.
Syntax:
CALL dbms.cluster.routing.getServers()
105
Example 30. Get routing recommendations
The following example shows how discover which instances in the cluster can provide which
services.
CALL dbms.cluster.routing.getServers()
The procedure returns a map between a particular service, READ, WRITE and ROUTE, and the
addresses of instances that provide this service. It also returns a Time To Live (TTL) for the
information.
The result is not primarily intended for human consumtion. Expanding it this is what it looks like.
ttl: 300,
server: [
{
addresses: [neo20:7687],
role: WRITE
}, {
addresses: [neo21:7687, neo22:7687, neo34:7687, neo28:7687, neo31:7687],
role: READ
}, {
addresses: [neo20:7687, neo21:7687, neo22:7687],
role: ROUTE
}
]
106
Chapter 9. Performance
This chapter describes factors that affect operational performance and how to tune Neo4j
for optimal throughput.
9.1. Memory tuning
This section covers how to configure memory for a Neo4j instance. The various memory
requirements and trade-offs are explained as well as the characteristics of garbage
collection.
Neo4j will automatically configure default values for memory-related configuration parameters that
are not explicitly defined within its configuration on startup. In doing so, it will assume that all of the
RAM on the machine is available for running Neo4j.
There are three types of memory to consider: OS Memory, Page Cache and Heap Space.
Please notice that the OS memory is not explicitly configurable, but is "what is left" when done
specifying page cache and heap space. If configuring page cache and heap space equal to or greater
than the available RAM, or if not leaving enough head room for the OS, the OS will start swapping to
disk, which will heavily affect performance. Therefore, follow this checklist:
1. Plan OS memory sizing
2. Plan page cache sizing
3. Plan heap sizing
4. Do the sanity check:
Actual OS allocation = available RAM - (page cache + heap size)

Make sure that your system is configured such that it will never need to swap.
9.1.1. OS memory sizing
Some memory must be reserved for all activities on the server that are not Neo4j related. In addition,
leave enough memory for the operating system file buffer cache to fit the contents of the index and
schema directories, since it will impact index lookup performance if the indexes cannot fit in memory.
1G is a good starting point for when Neo4j is the only server running on that machine.
OS Memory = 1GB + (size of graph.db/index) + (size of graph.db/schema)
9.1.2. Page cache sizing
The page cache is used to cache the Neo4j data as stored on disk. Ensuring that all, or at least most, of
the graph data from disk is cached into memory will help avoid costly disk access and result in optimal
performance. You can determine the total memory needed for the page cache by summing up the
sizes of the NEO4J_HOME/data/databases/graph.db/*store.db* files and adding 20% for growth.
The parameter for specifyig the page cache is: dbms.memory.pagecache.size. This specifies how much
memory Neo4j is allowed to use for this cache.
If this is not explicitly defined on startup, Neo4j will look at how much available memory the machine
107
has, subtract the JVM max heap allocation from that, and then use 50% of what is left for the page
cache. This is considered the default configuration.
The following are two possible methods for estimating the page cache size:
1. For an existing Neo4j database, sum up the size of all the store.db files in your store file directory,
to figure out how big a page cache you need to fit all your data. Add another 20% for growth. For
instance, on a posix system you can look at the total of running $ du -hc *store.db* in the
data/databases/graph.db directory.
2. For a new Neo4j database, it is useful to run an import with a fraction (e.g. 1/100th) of the data
and then multiply the resulting store-size by that fraction (x 100). Add another 20% for growth.
For example: import 1/100th of the data and sum up the sizes of the resulting database files. Then
multiply by 120 for a total estimate of the database size, including 20% for growth.
Parameter
Possible values
Effect
dbms.memory.pagecache.size
The maximum amount of memory
to use for the page cache, either in
bytes, or greater byte-like units,
such as 100m for 100 mega-bytes,
or 4g for 4 giga-bytes.
The amount of memory to use for
mapping the store files, in a unit of
bytes. This will automatically be
rounded down to the nearest whole
page. This value cannot be zero. For
extremely small and memory
constrained deployments, it is
recommended to still reserve at least a
couple of megabytes for the page cache.
unsupported.dbms.report_configuration
true or false
If set to true the current configuration
settings will be written to the default
system output, mostly the console or
the logfiles.
9.1.3. Heap sizing
The size of the available heap memory is an important aspect for the performance of Neo4j.
Generally speaking, it is beneficial to configure a large enough heap space to sustain concurrent
operations. For many setups, a heap size between 8G and 16G is large enough to run Neo4j reliably.
The heap memory size is determined by the parameters in NEO4J_HOME/conf/neo4j.conf, namely
dbms.memory.heap.initial_size and dbms.memory.heap.max_size providing the heap size in Megabytes
or with a unit, e.g. 16000 or preferably 16G. It is recommended to set these two parameters to the
same value to avoid unwanted full garbage collection pauses.
9.1.4. Tuning of the garbage collector
The heap is separated into an old generation and a young generation. New objects are allocated in the
young generation, and then later moved to the old generation, if they stay live (in use) for long
enough. When a generation fills up, the garbage collector performs a collection, during which all other
threads in the process are paused. The young generation is quick to collect since the pause time
correlates with the live set of objects, and is independent of the size of the young generation. In the
old generation, pause times roughly correlates with the size of the heap. For this reason, the heap
should ideally be sized and tuned such that transaction and query state never makes it to the old
generation.
The heap size is configured with the dbms.memory.heap.max_size (in MBs) setting in the neo4j.conf file.
The initial size of the heap is specified by the dbms.memory.heap.initial_size setting, or with the
-Xms???m flag, or chosen heuristically by the JVM itself if left unspecified. The JVM will automatically
grow the heap as needed, up to the maximum size. The growing of the heap requires a full garbage
collection cycle. It is recommended to set the initial heap size and the maximum heap size to the same
value. This way the pause that happens when the garbage collector grows the heap can be avoided.
108
The ratio of the size between the old generation and the new generation of the heap is controlled by
the -XX:NewRatio=N flag. N is typically between 2 and 8 by default. A ratio of 2 means that the old
generation size, divided by the new generation size, is equal to 2. In other words, two thirds of the
heap memory will be dedicated to the old generation. A ratio of 3 will dedicate three quarters of the
heap to the old generation, and a ratio of 1 will keep the two generations about the same size. A ratio
of 1 is quite aggressive, but may be necessary if your transactions changes a lot of data. Having a large
new generation can also be important if you run Cypher queries that need to keep a lot of data
resident, for example when sorting big result sets.
If the new generation is too small, short-lived objects may be moved to the old generation too soon.
This is called premature promotion and will slow the database down by increasing the frequency of
old generation garbage collection cycles. If the new generation is too big, the garbage collector may
decide that the old generation does not have enough space to fit all the objects it expects to promote
from the new to the old generation. This turns new generation garbage collection cycles into old
generation garbage collection cycles, again slowing the database down. Running more concurrent
threads means that more allocations can take place in a given span of time, in turn increasing the
pressure on the new generation in particular.

The Compressed OOPs feature in the JVM allows object references to be compressed
to use only 32 bits. The feature saves a lot of memory, but is not enabled for heaps
larger than 32 GB. Gains from increasing the heap size beyond 32 GB can therefore
be small or even negative, unless the increase is significant (64 GB or above).
Neo4j has a number of long-lived objects, that stay around in the old generation, effectively for the
lifetime of the Java process. To process them efficiently, and without adversely affecting the garbage
collection pause time, we recommend using a concurrent garbage collector.
How to tune the specific garbage collection algorithm depends on both the JVM version and the
workload. It is recommended to test the garbage collection settings under realistic load for days or
weeks. Problems like heap fragmentation can take a long time to surface.
To gain good performance, these are the things to look into first:
• Make sure the JVM is not spending too much time performing garbage collection. The goal is to
have a large enough heap to make sure that heavy/peak load will not result in so called GCtrashing. Performance can drop as much as two orders of magnitude when GC-trashing happens.
Having too large heap may also hurt performance so you may have to try some different heap
sizes.
• Use a concurrent garbage collector. We find that -XX:+UseG1GC works well in most use-cases.
• The Neo4j JVM needs enough heap memory for the transaction state and query processing,
plus some head-room for the garbage collector. Because the heap memory needs are so
workload dependent, it is common to see configurations from 1 GB, up to 32 GBs of heap
memory.
• Start the JVM with the -server flag and a good sized heap.
• The operating system on a dedicated server can usually make do with 1 to 2 GBs of memory,
but the more physical memory the machine has, the more memory the operating system will
need.
Edit the following properties:
Table 18. neo4j.conf JVM tuning properties
Property Name
Meaning
dbms.memory.heap.initial_size
initial heap size (in MB)
dbms.memory.heap.max_size
maximum heap size (in MB)
109
Property Name
Meaning
dbms.jvm.additional
additional literal JVM parameter
9.2. Transaction logs
This section explains the retention and rotation policies for the Neo4j logs and how to
configure them.
The transaction logs record all operations in the database. They are the source of truth in scenarios
where the database needs to be recovered. Transaction logs are used to provide for incremental
backups, as well as for cluster operations. For any given configuration at least the latest non-empty
transaction log will be kept.
By default, log switches happen when log sizes surpass 250 MB. This can be configured using the
parameter dbms.tx_log.rotation.size.
There are several different means of controlling the amount of transaction logs that is kept, using the
parameter dbms.tx_log.rotation.retention_policy. The format in which this is configured is:
dbms.tx_log.rotation.retention_policy=<true/false>
dbms.tx_log.rotation.retention_policy=<amount> <type>
For example:
# Will keep logical logs indefinitely
dbms.tx_log.rotation.retention_policy=true
# Will keep only the most recent non-empty log
dbms.tx_log.rotation.retention_policy=false
# Will keep logical logs which contains any transaction committed within 30 days
dbms.tx_log.rotation.retention_policy=30 days
# Will keep logical logs which contains any of the most recent 500 000 transactions
dbms.tx_log.rotation.retention_policy=500k txs
Full list:
Type
Description
Example
files
Number of most recent logical log files to keep
"10 files"
size
Max disk size to allow log files to occupy
"300M size" or "1G
size"
txs
Number of latest transactions to keep Keep
"250k txs" or "5M
txs"
hours
Keep logs which contains any transaction committed "10 hours"
within N hours from current time
days
Keep logs which contains any transaction committed "50 days"
within N days from current time
9.3. Compressed storage
This section explains Neo4j property value compression and disk usage.
110
Neo4j can in many cases compress and inline the storage of property values, such as short arrays and
strings, with the purpose of saving disk space and possibly an I/O operation.
Compressed storage of short arrays
Neo4j will try to store your primitive arrays in a compressed way. To do that, it employs a "bit-shaving"
algorithm that tries to reduce the number of bits required for storing the members of the array. In
particular:
1. For each member of the array, it determines the position of leftmost set bit.
2. Determines the largest such position among all members of the array.
3. It reduces all members to that number of bits.
4. Stores those values, prefixed by a small header.
That means that when even a single negative value is included in the array then the original size of the
primitives will be used.
There is a possibility that the result can be inlined in the property record if:
• It is less than 24 bytes after compression.
• It has less than 64 members.
For example, an array long[] {0L, 1L, 2L, 4L} will be inlined, as the largest entry (4) will require 3 bits
to store so the whole array will be stored in 4 × 3 = 12 bits. The array long[] {-1L, 1L, 2L, 4L}
however will require the whole 64 bits for the -1 entry so it needs 64 × 4 = 32 bytes and it will end up in
the dynamic store.
Compressed storage of short strings
Neo4j will try to classify your strings in a short string class and if it manages that it will treat it
accordingly. In that case, it will be stored without indirection in the property store, inlining it instead in
the property record, meaning that the dynamic string store will not be involved in storing that value,
leading to reduced disk footprint. Additionally, when no string record is needed to store the property,
it can be read and written in a single lookup, leading to performance improvements and less disk
space required.
The various classes for short strings are:
• Numerical, consisting of digits 0..9 and the punctuation space, period, dash, plus, comma and
apostrophe.
• Date, consisting of digits 0..9 and the punctuation space dash, colon, slash, plus and comma.
• Hex (lower case), consisting of digits 0..9 and lower case letters a..f
• Hex (upper case), consisting of digits 0..9 and upper case letters a..f
• Upper case, consisting of upper case letters A..Z, and the punctuation space, underscore, period,
dash, colon and slash.
• Lower case, like upper but with lower case letters a..z instead of upper case
• E-mail, consisting of lower case letters a..z and the punctuation comma, underscore, period, dash,
plus and the at sign (@).
• URI, consisting of lower case letters a..z, digits 0..9 and most punctuation available.
• Alpha-numerical, consisting of both upper and lower case letters a..zA..z, digits 0..9 and
punctuation space and underscore.
• Alpha-symbolical, consisting of both upper and lower case letters a..zA..Z and the punctuation
111
space, underscore, period, dash, colon, slash, plus, comma, apostrophe, at sign, pipe and
semicolon.
• European, consisting of most accented european characters and digits plus punctuation space,
dash, underscore and period — like latin1 but with less punctuation.
• Latin 1.
• UTF-8.
In addition to the string’s contents, the number of characters also determines if the string can be
inlined or not. Each class has its own character count limits, which are
Table 19. Character count limits
String class
Character
count limit
Numerical, Date and Hex
54
Uppercase, Lowercase and E-mail
43
URI, Alphanumerical and Alphasymbolical
36
European
31
Latin1
27
UTF-8
14
That means that the largest inline-able string is 54 characters long and must be of the Numerical class
and also that all Strings of size 14 or less will always be inlined.
Also note that the above limits are for the default 41 byte PropertyRecord layout — if that parameter is
changed via editing the source and recompiling, the above have to be recalculated.
9.4. Linux file system tuning
This section covers Neo4j I/O behavior and how to optimize for operations on disk.
Databases often produce many small and random reads when querying data, and few sequential
writes when committing changes.
By default, most Linux distributions schedule IO requests using the Completely Fair Queuing (CFQ)
algorithm, which provides a good balance between throughput and latency. The particular IO
workload of a database, however, is better served by the Deadline scheduler. The Deadline scheduler
gives preference to read requests, and processes them as soon as possible. This tends to decrease the
latency of reads, while the latency of writes goes up. Since the writes are usually sequential, their
lingering in the IO queue increases the change of overlapping or adjacent write requests being
merged together. This effectively reduces the number of writes that are sent to the drive.
On Linux, the IO scheduler for a drive, in this case sda, can be changed at runtime like this:
$ echo 'deadline' > /sys/block/sda/queue/scheduler
$ cat
/sys/block/sda/queue/scheduler
noop [deadline] cfq
Another recommended practice is to disable file and directory access time updates. This way, the file
system won’t have to issue writes that update this meta-data, thus improving write performance. This
can be accomplished by setting the noatime,nodiratime mount options in fstab, or when issuing the
disk mount command.
112
9.5. Disks, RAM and other tips
This section provides an overview of performance considerations for disk and RAM when
running Neo4j.
As with any persistence solution, performance depends a lot on the persistence media used. Better
disks equals better performance.
If you have multiple disks or persistence media available it may be a good idea to divide the store files
and transaction logs across those disks. Keeping the store files on disks with low seek time can do
wonders for read operations. Today a typical mechanical drive has an average seek time of about
5ms. This can cause a query or traversal to be very slow when the amount of RAM assigned to the
page cache is too small. A new, good SATA enabled SSD has an average seek time of less than 100
microseconds, meaning those scenarios will execute at least 50 times faster. However, this is still tens
or hundreds of times slower than accessing RAM.
To avoid hitting disk you need more RAM. On a standard mechanical drive you can handle graphs with
a few tens of millions of primitives (nodes, relationships and properties) with 2-3 GBs of RAM. A server
with 8-16 GBs of RAM can handle graphs with hundreds of millions of primitives, and a good server
with 16-64 GBs can handle billions of primitives. However, if you invest in a good SSD you will be able
to handle much larger graphs on less RAM.
Use tools like dstat or vmstat to gather information when your application is running. If the swap or
paging numbers are high, that is a sign that the Lucene indexes don’t quite fit in memory. In this case,
queries that do index lookups will have high latencies.
When Neo4j starts up, its page cache is empty and needs to warm up. This can take a while, especially
for large stores. It is not uncommon to see a long period with many blocks being read from the drive,
and high IO wait times.
Neo4j also flushes its page cache in the background, so it is not uncommon to see a steady trickle of
blocks being written to the drive during steady-state. This background flushing only produces a small
amount of IO wait, however. If the IO wait times are high during steady-state, it may be a sign that
Neo4j is bottle-necked on the random IO performance of the drive. The best drives for running Neo4j
are fast SSDs that can take lots of random IOPS.
113
Chapter 10. Tools
This chapter describes the Neo4j tools Import, Cypher Shell, the dump and load facilities of
the Neo4j Admin tool, and the consistency checker.
This chapter comprises the following topics:
• How to import data into Neo4j using the import tool
• A description of the header format of CSV files when using the import tool
• How to use the import tool from the command line
• How to use the Cypher Shell
• How to dump and load Neo4j databases using neo4j-admin
• How to check the consistency of a Neo4j database using neo4j-admin
10.1. Import
This chapter covers importing data into Neo4j.
The import tool is used to create a new Neo4j database from data in CSV files.
This chapter explains how to use the tool and format the input data. For in-depth examples of using
the import tool, see Use the Import tool.
These are some things you will need to keep in mind when creating your input files:
• Fields are comma separated by default but a different delimiter can be specified.
• All files must use the same delimiter.
• Multiple data sources can be used for both nodes and relationships.
• A data source can optionally be provided using multiple files.
• A header which provides information on the data fields must be on the first row of each data
source.
• Fields without corresponding information in the header will not be read.
• UTF-8 encoding is used.

Indexes are not created during the import. Instead, you will need to add indexes
afterwards (see Developer Manual → Indexes (http://neo4j.com/docs/developermanual/3.1/introduction/graphdb-concepts/#graphdb-neo4j-schema-indexes)).
Data cannot be imported into an existing database using this tool. If you want to
load small to medium sized CSV files use LOAD CSV (see Developer Manual → LOAD
CSV (http://neo4j.com/docs/developer-manual/3.1/cypher/clauses/load-csv)).
10.1.1. CSV file header format
This section explains the header format of CSV files when using the Neo4j import tool.
The header row of each data source specifies how the fields should be interpreted. The same
114
delimiter is used for the header row as for the rest of the data.
The header contains information for each field, with the format: <name>:<field_type>. The <name> is
used as the property key for values, and ignored in other cases. The following <field_type> settings
can be used for both nodes and relationships:
Property value
Use one of int, long, float, double, boolean, byte, short, char, string to designate the data type. If
no data type is given, this defaults to string. To define an array type, append [] to the type. By
default, array values are separated by ;. A different delimiter can be specified with --array
-delimiter.
IGNORE
Ignore this field completely.
See below for the specifics of node and relationship data source headers.
Nodes
The following field types do additionally apply to node data sources:
ID
Each node must have a unique id which is used during the import. The ids are used to find the
correct nodes when creating relationships. Note that the id has to be unique across all nodes in the
import, even nodes with different labels.
LABEL
Read one or more labels from this field. Like array values, multiple labels are separated by ;, or by
the character specified with --array-delimiter.
Relationships
For relationship data sources, there are three mandatory fields:
TYPE
The relationship type to use for the relationship.
START_ID
The id of the start node of the relationship to create.
END_ID
The id of the end node of the relationship to create.
ID spaces
The import tool assumes that node identifiers are unique across node files. If this is not the case then
we can define an id space. Id spaces are defined in the ID field of node files.
For example, to specify the Person id space we would use the field type ID(Person) in our persons
node file. We also need to reference that id space in our relationships file i.e. START_ID(Person) or
END_ID(Person).
10.1.2. Command line usage
This section covers how to use the Neo4j import tool from the command line.
115
Linux
Under Unix/Linux/OSX, the command is named neo4j-import. Depending on the installation type, the
tool is either available globally, or used by executing ./bin/neo4j-import from inside the installation
directory.
Windows
Under Windows, used by executing bin\neo4j-import from inside the installation directory.
For help with running the import tool under Windows, see the reference in Windows.
Options
--into <store-dir>
Database directory to import into. Must not contain existing database.
--nodes[:Label1:Label2] "<file1>,<file2>,…"
Node CSV header and data. Multiple files will be logically seen as one big file from the perspective
of the importer. The first line must contain the header. Multiple data sources like these can be
specified in one import, where each data source has its own header. Note that file groups must be
enclosed in quotation marks.
--relationships[:RELATIONSHIP_TYPE] "<file1>,<file2>,…"
Relationship CSV header and data. Multiple files will be logically seen as one big file from the
perspective of the importer. The first line must contain the header. Multiple data sources like these
can be specified in one import, where each data source has its own header. Note that file groups
must be enclosed in quotation marks.
--delimiter <delimiter-character>
Delimiter character, or TAB, between values in CSV data. The default option is ,.
--array-delimiter <array-delimiter-character>
Delimiter character, or TAB, between array elements within a value in CSV data. The default option
is ;.
--quote <quotation-character>
Character to treat as quotation character for values in CSV data. The default option is ". Quotes
inside quotes escaped like """Go away"", he said." and "\"Go away\", he said." are supported. If
you have set ' to be used as the quotation character, you could write the previous example like this
instead: '"Go away", he said.'
--multiline-fields <true/false>
Whether or not fields from input source can span multiple lines, i.e. contain newline characters.
Default value: false
--input-encoding <character set>
Character set that input data is encoded in. Provided value must be one out of the available
character sets in the JVM, as provided by Charset#availableCharsets(). If no input encoding is
provided, the default character set of the JVM will be used.
--ignore-empty-strings <true/false>
Whether or not empty string fields ("") from input source are ignored, i.e. treated as null. Default
value: false
--id-type <id-type>
116
One out of [STRING, INTEGER, ACTUAL] and specifies how ids in node/relationship input files are
treated. STRING: arbitrary strings for identifying nodes. INTEGER: arbitrary integer values for
identifying nodes. ACTUAL: (advanced) actual node ids. Default value: STRING
--processors <max processor count>
(advanced) Max number of processors used by the importer. Defaults to the number of available
processors reported by the JVM. There is a certain amount of minimum threads needed so for that
reason there is no lower bound for this value. For optimal performance this value shouldn’t be
greater than the number of available processors.
--stacktrace <true/false>
Enable printing of error stack traces.
--bad-tolerance <max number of bad entries>
Number of bad entries before the import is considered failed. This tolerance threshold is about
relationships referring to missing nodes. Format errors in input data are still treated as errors.
Default value: 1000
--skip-bad-relationships <true/false>
Whether or not to skip importing relationships that refers to missing node ids, i.e. either start or
end node id/group referring to node that wasn’t specified by the node input data. Skipped nodes
will be logged, containing at most number of entities specified by bad-tolerance. Default value: true
--skip-duplicate-nodes <true/false>
Whether or not to skip importing nodes that have the same id/group. In the event of multiple
nodes within the same group having the same id, the first encountered will be imported whereas
consecutive such nodes will be skipped. Skipped nodes will be logged, containing at most number
of entities specified by bad-tolerance. Default value: false
--ignore-extra-columns <true/false>
Whether or not to ignore extra columns in the data not specified by the header. Skipped columns
will be logged, containing at most number of entities specified by bad-tolerance. Default value:
false
--db-config <path/to/neo4j.conf>
(advanced) File specifying database-specific configuration. For more information consult manual
about available configuration options for a neo4j configuration file. Only configuration affecting
store at time of creation will be read. Examples of supported config are:
• dbms.relationship_grouping_threshold
• unsupported.dbms.block_size.strings
• unsupported.dbms.block_size.array_properties
Verbose error information
In some cases if an unexpected error occurs it might be useful to supply the command line option
--stacktrace to the import (and rerun the import to actually see the additional information). This will
have the error printed with additional debug information, useful for both developers and issue
reporting.
Output and statistics
While an import is running through its different stages, some statistics and figures are printed in the
console. The general interpretation of that output is to look at the horizontal line, which is divided up
into sections, each section representing one type of work going on in parallel with the other sections.
The wider a section is, the more time is spent there relative to the other sections, the widest being the
117
bottleneck, also marked with *. If a section has a double line, instead of just a single line, it means that
multiple threads are executing the work in that section. To the far right a number is displayed telling
how many entities (nodes or relationships) have been processed by that stage.
As an example:
[*>:20,25 MB/s------------------|PREPARE(3)====================|RELATIONSHIP(2)===============]
16M
Would be interpreted as:
• > data being read, and perhaps parsed, at 20,25 MB/s, data that is being passed on to …
• PREPARE preparing the data for …
• RELATIONSHIP creating actual relationship records and …
• v writing the relationships to the store. This step is not visible in this example, because it is so
cheap compared to the other sections.
Observing the section sizes can give hints about where performance can be improved. In the example
above, the bottleneck is the data read section (marked with >), which might indicate that the disk is
being slow, or is poorly handling simultaneous read and write operations (since the last section often
revolves around writing to disk).
10.2. Cypher Shell
This chapter describes the Cypher Shell.
Cypher Shell is a command-line tool that is installed as part of the product. You can connect to a Neo4j
database and use Cypher to query data, define schema or perform administrative tasks. Cypher Shell
exposes explicit transactions allowing multiple operations to be grouped and applied or rolled back
together. Cypher Shell communicates via the encrypted binary protocol Bolt.
10.2.1. Invoking Cypher Shell
Cypher Shell is located in the bin directory and is invoked with a set of arguments. Note that the very
first time you run Cypher Shell, you will be prompted with a security message. Just follow the
instructions in the message, then you will be ready to go.
cypher-shell [-h] [-a ADDRESS] [-u USERNAME] [-p PASSWORD] [--encryption {true,false}] [-format {verbose,plain}] [--debug] [--fail-fast | --fail-at-end] [cypher]
Arguments
Positional arguments:
Description
cypher
An optional string of cypher to execute and then exit
Optional arguments:
-h, --help
Show help message and exit
--fail-fast
Exit and report failure on first error when reading from file
(this is the default behavior)
--fail-at-end
Exit and report failures at end of input when reading from
file
--format {verbose,plain}
Desired output format, verbose(default) displays statistics,
plain only displays data (default: verbose)
118
--debug
Print additional debug information (default: false)
Connection arguments:
-a ADDRESS, --address
ADDRESS address and port to connect to (default:
localhost:7687)
-u USERNAME, --username
USERNAME username to connect as. Can also be specified
using environment variable NEO4J_USERNAME (default: )
-p PASSWORD, --password
PASSWORD password to connect with. Can also be
specified using environment variable NEO4J_PASSWORD
(default: )
--encryption {true,false}
whether the connection to Neo4j should be encrypted;
must be consistent with Neo4j’s configuration (default:
true)
Example 31. Invoke Cypher Shell with username and password
$neo4j-home> bin/cypher-shell -u johndoe -p secret
Connected to Neo4j at bolt://localhost:7687 as user neo4j.
Type :help for a list of available commands or :exit to exit the shell.
Note that Cypher queries must end with a semicolon.
neo4j>
Example 32. Invoke help from within Cypher Shell
neo4j> :help
Available commands:
:begin
Open a transaction
:commit
Commit the currently open transaction
:exit
Exit the logger
:help
Show this help message
:history Print a list of the last commands executed
:param
Set the value of a query parameter
:params
Prints all currently set query parameters and their values
:rollback Rollback the currently open transaction
For help on a specific command type:
:help command
Example 33. Execute a query from within Cypher Shell
neo4j> MATCH (n) RETURN n;
n
(:Person {name: "Bruce Wayne", alias: "Batman"})
(:Person {name: "Selina Kyle", alias: ["Catwoman", "The Cat"]})
119
Example 34. Invoke Cypher Shell with a Cypher script from the command line
Below is the contents of a file called examples.cypher:
MATCH (n) RETURN n;
MATCH (batman:Person {name: 'Bruce Wayne'}) RETURN batman;
Invoke the 'examples.cypher' script from the command-line. In this example we are also using the
--format plain flag to limit the output:
$neo4j-home> cat examples.cypher | bin/cypher-shell -u neo4j -p maria --format plain
n
(:Person {name: "Bruce Wayne", alias: "Batman"})
(:Person {name: "Selina Kyle", alias: ["Catwoman", "The Cat"]})
batman
(:Person {name: "Bruce Wayne", alias: "Batman"})
10.2.2. Query parameters
Cypher Shell supports querying based on parameters. This is often used while scripting.
Example 35. Use parameters within Cypher Shell
Set the parameter 'thisAlias' to 'Robin' using the ':param' keyword. Check the parameter using the
':params' keyword.
neo4j> :param thisAlias 'Robin'
neo4j> :params
thisAlias: Robin
Now use the parameter 'thisAlias' in a Cypher query. Verify the result.
neo4j> CREATE (:Person {name : 'Dick Grayson', alias : {thisAlias} });
Added 1 nodes, Set 2 properties, Added 1 labels
neo4j> MATCH (n) RETURN n;
n
(:Person {name: "Bruce Wayne", alias: "Batman"})
(:Person {name: "Selina Kyle", alias: ["Catwoman", "The Cat"]})
(:Person {name: "Dick Grayson", alias: "Robin"})
10.2.3. Transactions
Cypher Shell supports explicit transactions. Transaction states are controlled using the keywords
begin, :commit, and :rollback:
120
Example 36. Use fine-grained transaction control
Start a transaction in your first Cypher Shell session:
neo4j> MATCH (n) RETURN n;
n
(:Person {name: "Bruce Wayne", alias: "Batman"})
(:Person {name: "Selina Kyle", alias: ["Catwoman", "The Cat"]})
(:Person {name: "Dick Grayson", alias: "Robin"})
neo4j> :begin
neo4j# CREATE (:Person {name : 'Edward Mygma', alias : 'The Riddler' });
Added 1 nodes, Set 2 properties, Added 1 labels
If you now open up a second Cypher Shell session, you will notice no changes from the latest
CREATE statement:
neo4j> MATCH (n) RETURN n;
n
(:Person {name: "Bruce Wayne", alias: "Batman"})
(:Person {name: "Selina Kyle", alias: ["Catwoman", "The Cat"]})
(:Person {name: "Dick Grayson", alias: "Robin"})
Go back to the first session and commit the transaction:
neo4j# :commit
neo4j> MATCH (n) RETURN n;
n
(:Person {name: "Bruce Wayne", alias: "Batman"})
(:Person {name: "Selina Kyle", alias: ["Catwoman", "The Cat"]})
(:Person {name: "Dick Grayson", alias: "Robin"})
(:Person {name: "Edward Mygma", alias: "The Riddler"})
neo4j>
10.2.4. Procedures
Cypher Shell supports running any procedures for which the current user is authorized. Here, we are
using the natively built-in procedure dbms.security.showCurrentUser().
Example 37. Call a procedure from within Cypher Shell
neo4j> CALL dbms.security.showCurrentUser();
username, roles, flags
"johndoe", ["admin"], []
neo4j> :exit
Exiting. Bye bye.
Bye!
10.3. Dump and load databases
This chapter describes the dump and load commands of neo4j-admin.
A Neo4j database can be dumped and loaded using the dump and load commands of neo4j-admin:
neo4j-admin dump --database=<database> --to=<destination-path>
121
neo4j-admin load --from=<archive-path> --database=<database> [--force]
These commands can be useful for moving databases from one environment to another. They can
also be used for offline backups of the database.
Example 38. Use the dump command of neo4j-admin
Dump the database called graph.db into a file called /backups/graph.db/2016-10-02.dump. The
destination directory for the dump file — in this case /backups/graph.db — must exist before
calling the command.
$neo4j-home> bin/neo4j-admin dump --database=graph.db --to=/backups/graph.db/2016-10-02.dump
$neo4j-home> ls /backups/graph.db
$neo4j-home> 2016-10-02.dump
Example 39. Use the load command of neo4j-admin
Load the backed-up database contained in the file /backups/graph.db/2016-10-02.dump into
database graph.db. Since we have a database running, we first have to shut it down. When we
use the --force option, any existing database gets overwritten.
$neo4j-home> bin/neo4j stop
Stopping Neo4j.. stopped
$neo4j-home> bin/neo4j-admin load --from=/backups/graph.db/2016-10-02.dump --database=graph.db
--force
10.4. Consistency checker
This chapter describes the consistency checker.
The consistency of a database can be checked using the check-consistency argument to the neo4jadmin tool.
10.4.1. Check database consistency
The neo4j-admin tool is located in the bin directory. Run it with the check-consistency argument in
order to check the consistency of a database.
neo4j-admin check-consistency --database=<database> [--report-dir=<directory>] [--additionalconfig=<file>] [--verbose]
Arguments
Argument
Description
--database
Specifies the name of the database on which to run the
consistency checker.
--report-dir
Directory into which the report will be written. Defaults to
the working directory.
122
Argument
Description
--additional-config
Provides a config file to set additional configuration that is
specific to the consistency checker.
--verbose
Enable verbose output of store information and memory
usage.
Limitations
The consistency checker cannot be used with a database which is currently in use. If used with a
running database, it will stop and print an error.
Output
If the consistency checker does not find errors, it will exit cleanly and not produce a report. If the
consistency checker finds errors, it will exit with an exit code of 1 and write a report file with a name
on the format inconsistencies-YYYY-MM-DD.HH24.MI.SS.report. The location of the report file is the
current working directory, or as specified by the parameter report-dir.
Example 40. Run the consistency checker
$neo4j-home> bin/neo4j-admin check-consistency --database=graph.db
2016-09-30 14:00:47.287+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected
format from the store: RecordFormat:StandardV3_0[v0.A.7]
.................... 10%
.................... 20%
.................... 30%
.................... 40%
.................... 50%
..............Checking node and relationship counts
.................... 10%
.................... 20%
.................... 30%
.................... 40%
.................... 50%
.................... 60%
.................... 70%
.................... 80%
.................... 90%
.................... 100%
10.4.2. Provide additional configuration
The consistency checker accepts additional configuration options in a configuration file specified by
the argument additional-config. The configuration file has the same format as neo4j.conf.
Parameter name
Default value
Description
tools.consistency_checker.check_grap true
h
Perform checks between nodes,
relationships, properties, types and
tokens.
tools.consistency_checker.check_inde true
xes
Perform checks on indexes. Checking
indexes is more expensive than
checking the native stores, so it may
be useful to turn off this check for very
large databases.
tools.consistency_checker.check_labe true
l_scan_store
Perform checks on the label scan
store. Checking this store is more
expensive than checking the native
stores, so it may be useful to turn off
this check for very large databases.
123
Parameter name
Default value
Description
tools.consistency_checker.check_prop false
erty_owners
Perform optional additional checking
on property ownership. This can
detect a theoretical inconsistency
where a property could be owned by
multiple entities. However, the check is
very expensive in time and memory,
so it is skipped by default.
Example 41. Provide additional configuration to the consistency checker
Create file consistency-check.properties:
tools.consistency_checker.check_graph=false
tools.consistency_checker.check_indexes=true
tools.consistency_checker.check_label_scan_store=true
tools.consistency_checker.check_property_owners=false
Run the consistency checker with the configuration file:
$neo4j-home> bin/neo4j-admin check-consistency --database=graph.db --additional-config=consistency
-check.properties
124
Appendix A: Reference
This appendix contains the Neo4j configuration settings reference, the list of built-in
procedures bundled with Neo4j, and a description of user management for Community
Edition.
• Configuration settings
• Built-in procedures
• User management for Community Edition
A.1. Configuration settings
This section contains a complete reference of Neo4j configuration settings. They can be set in
neo4j.conf.
Settings used by the server configuration
• browser.allow_outgoing_connections: Configure the policy for outgoing Neo4j Browser
connections.
• browser.credential_timeout: Configure the Neo4j Browser to time out logged in users after this
idle period.
• browser.remote_content_hostname_whitelist: Whitelist of hosts for the Neo4j Browser to be
allowed to fetch content from.
• browser.retain_connection_credentials: Configure the Neo4j Browser to store or not store user
credentials.
• causal_clustering.array_block_id_allocation_size: The size of the ID allocation requests Core servers
will make when they run out of ARRAY_BLOCK IDs.
• causal_clustering.catchup_batch_size: The maximum batch size when catching up (in unit of
entries).
• causal_clustering.cluster_allow_reads_on_followers: Configure if the
dbms.cluster.routing.getServers() procedure should include followers as read endpoints or
return only read replicas.
• causal_clustering.cluster_routing_ttl: How long drivers should cache the data from the
dbms.cluster.routing.getServers() procedure.
• causal_clustering.cluster_topology_refresh: Time between scanning the cluster to refresh current
server’s view of topology.
• causal_clustering.disable_middleware_logging: Prevents the network middleware from dumping its
own logs.
• causal_clustering.discovery_advertised_address: Advertised cluster member discovery
management communication.
• causal_clustering.discovery_listen_address: Host and port to bind the cluster member discovery
management communication.
• causal_clustering.expected_core_cluster_size: Expected number of Core machines in the cluster.
• causal_clustering.global_session_tracker_state_size: The maximum file size before the global
session tracker state file is rotated (in unit of entries).
• causal_clustering.id_alloc_state_size: The maximum file size before the ID allocation file is rotated
(in unit of entries).
125
• causal_clustering.initial_discovery_members: A comma-separated list of other members of the
cluster to join.
• causal_clustering.join_catch_up_timeout: Time out for a new member to catch up.
• causal_clustering.label_token_id_allocation_size: The size of the ID allocation requests Core servers
will make when they run out of LABEL_TOKEN IDs.
• causal_clustering.label_token_name_id_allocation_size: The size of the ID allocation requests Core
servers will make when they run out of LABEL_TOKEN_NAME IDs.
• causal_clustering.last_applied_state_size: The maximum file size before the storage file is rotated
(in unit of entries).
• causal_clustering.leader_election_timeout: The time limit within which a new leader election will
occur if no messages are received.
• causal_clustering.log_shipping_max_lag: The maximum lag allowed before log shipping pauses (in
unit of entries).
• causal_clustering.neostore_block_id_allocation_size: The size of the ID allocation requests Core
servers will make when they run out of NEOSTORE_BLOCK IDs.
• causal_clustering.node_id_allocation_size: The size of the ID allocation requests Core servers will
make when they run out of NODE IDs.
• causal_clustering.node_labels_id_allocation_size: The size of the ID allocation requests Core
servers will make when they run out of NODE_LABELS IDs.
• causal_clustering.outgoing_queue_size: The number of messages waiting to be sent to other
servers in the cluster.
• causal_clustering.property_id_allocation_size: The size of the ID allocation requests Core servers
will make when they run out of PROPERTY IDs.
• causal_clustering.property_key_token_id_allocation_size: The size of the ID allocation requests
Core servers will make when they run out of PROPERTY_KEY_TOKEN IDs.
• causal_clustering.property_key_token_name_id_allocation_size: The size of the ID allocation
requests Core servers will make when they run out of PROPERTY_KEY_TOKEN_NAME IDs.
• causal_clustering.pull_interval: Interval of pulling updates from cores.
• causal_clustering.raft_advertised_address: Advertised hostname/IP address and port for the RAFT
server.
• causal_clustering.raft_listen_address: Network interface and port for the RAFT server to listen on.
• causal_clustering.raft_log_implementation: RAFT log implementation.
• causal_clustering.raft_log_prune_strategy: RAFT log pruning strategy.
• causal_clustering.raft_log_pruning_frequency: RAFT log pruning frequency.
• causal_clustering.raft_log_reader_pool_size: RAFT log reader pool size.
• causal_clustering.raft_log_rotation_size: RAFT log rotation size.
• causal_clustering.raft_membership_state_size: The maximum file size before the membership
state file is rotated (in unit of entries).
• causal_clustering.raft_messages_log_enable: Enable or disable the dump of all network messages
pertaining to the RAFT protocol.
• causal_clustering.raft_term_state_size: The maximum file size before the term state file is rotated
(in unit of entries).
• causal_clustering.raft_vote_state_size: The maximum file size before the vote state file is rotated
(in unit of entries).
• causal_clustering.read_replica_refresh_rate: Read replica 'call home' frequency.
• causal_clustering.read_replica_time_to_live: Time To Live before read replica is considered
126
unavailable.
• causal_clustering.relationship_group_id_allocation_size: The size of the ID allocation requests Core
servers will make when they run out of RELATIONSHIP_GROUP IDs.
• causal_clustering.relationship_id_allocation_size: The size of the ID allocation requests Core
servers will make when they run out of RELATIONSHIP IDs.
• causal_clustering.relationship_type_token_id_allocation_size: The size of the ID allocation requests
Core servers will make when they run out of RELATIONSHIP_TYPE_TOKEN IDs.
• causal_clustering.relationship_type_token_name_id_allocation_size: The size of the ID allocation
requests Core servers will make when they run out of RELATIONSHIP_TYPE_TOKEN_NAME IDs.
• causal_clustering.replicated_lock_token_state_size: The maximum file size before the replicated
lock token state file is rotated (in unit of entries).
• causal_clustering.schema_id_allocation_size: The size of the ID allocation requests Core servers will
make when they run out of SCHEMA IDs.
• causal_clustering.state_machine_apply_max_batch_size: The maximum number of operations to
be batched during applications of operations in the state machines.
• causal_clustering.state_machine_flush_window_size: The number of operations to be processed
before the state machines flush to disk.
• causal_clustering.string_block_id_allocation_size: The size of the ID allocation requests Core
servers will make when they run out of STRING_BLOCK IDs.
• causal_clustering.transaction_advertised_address: Advertised hostname/IP address and port for
the transaction shipping server.
• causal_clustering.transaction_listen_address: Network interface and port for the transaction
shipping server to listen on.
• causal_clustering.unknown_address_logging_throttle: Throttle limit for logging unknown cluster
member address.
• cypher.default_language_version: Set this to specify the default parser (language version).
• cypher.forbid_exhaustive_shortestpath: This setting is associated with performance optimization.
• cypher.hints_error: Set this to specify the behavior when Cypher planner or runtime hints cannot
be fulfilled.
• cypher.min_replan_interval: The minimum lifetime of a query plan before a query is considered for
replanning.
• cypher.planner: Set this to specify the default planner for the default language version.
• cypher.statistics_divergence_threshold: The threshold when a plan is considered stale.
• dbms.active_database: Name of the database to load.
• dbms.allow_format_migration: Whether to allow a store upgrade in case the current version of the
database starts against an older store version.
• dbms.backup.address: Listening server for online backups.
• dbms.backup.enabled: Enable support for running online backups.
• dbms.checkpoint.interval.time: Configures the time interval between check-points.
• dbms.checkpoint.interval.tx: Configures the transaction interval between check-points.
• dbms.checkpoint.iops.limit: Limit the number of IOs the background checkpoint process will
consume per second.
• dbms.connectors.default_advertised_address: Default hostname or IP address the server uses to
advertise itself to its connectors.
• dbms.connectors.default_listen_address: Default network interface to listen for incoming
connections.
127
• dbms.directories.certificates: Directory for storing certificates to be used by Neo4j for TLS
connections.
• dbms.directories.data: Path of the data directory.
• dbms.directories.import: Sets the root directory for file URLs used with the Cypher LOAD CSV
clause.
• dbms.directories.lib: Path of the lib directory.
• dbms.directories.logs: Path of the logs directory.
• dbms.directories.metrics: The target location of the CSV files: a path to a directory wherein a CSV
file per reported field will be written.
• dbms.directories.plugins: Location of the database plugin directory.
• dbms.directories.run: Path of the run directory.
• dbms.ids.reuse.types.override: Specified names of id types (comma separated) that should be
reused.
• dbms.index_sampling.background_enabled: Enable or disable background index sampling.
• dbms.index_sampling.sample_size_limit: Index sampling chunk size limit.
• dbms.index_sampling.update_percentage: Percentage of index updates of total index size
required before sampling of a given index is triggered.
• dbms.index_searcher_cache_size: The maximum number of open Lucene index searchers.
• dbms.logs.debug.level: Debug log level threshold.
• dbms.logs.debug.rotation.delay: Minimum time interval after last rotation of the debug log before
it may be rotated again.
• dbms.logs.debug.rotation.keep_number: Maximum number of history files for the debug log.
• dbms.logs.debug.rotation.size: Threshold for rotation of the debug log.
• dbms.logs.gc.enabled: Enable GC Logging.
• dbms.logs.gc.options: GC Logging Options.
• dbms.logs.gc.rotation.keep_number: Number of GC logs to keep.
• dbms.logs.gc.rotation.size: Size of each GC log that is kept.
• dbms.logs.http.enabled: Enable HTTP request logging.
• dbms.logs.http.rotation.keep_number: Number of HTTP logs to keep.
• dbms.logs.http.rotation.size: Size of each HTTP log that is kept.
• dbms.logs.query.enabled: Log executed queries that take longer than the configured threshold,
dbms.logs.query.threshold.
• dbms.logs.query.parameter_logging_enabled: Log parameters for executed queries that took
longer than the configured threshold.
• dbms.logs.query.rotation.keep_number: Maximum number of history files for the query log.
• dbms.logs.query.rotation.size: The file size in bytes at which the query log will auto-rotate.
• dbms.logs.query.threshold: If the execution of query takes more time than this threshold, the
query is logged - provided query logging is enabled.
• dbms.logs.security.level: Security log level threshold.
• dbms.logs.security.rotation.delay: Minimum time interval after last rotation of the security log
before it may be rotated again.
• dbms.logs.security.rotation.keep_number: Maximum number of history files for the security log.
• dbms.logs.security.rotation.size: Threshold for rotation of the security log.
128
• dbms.memory.pagecache.size: The amount of memory to use for mapping the store files, in bytes
(or kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
• dbms.memory.pagecache.swapper: Specify which page swapper to use for doing paged IO.
• dbms.mode: Configure the operating mode of the database — 'SINGLE' for stand-alone operation,
'HA' for operating as a member in a cluster, 'ARBITER' for an HA-only cluster member with no
database, CORE for a core member of a Causal Clustering cluster, or READ_REPLICA for read
replica.
• dbms.query_cache_size: The number of Cypher query execution plans that are cached.
• dbms.read_only: Only allow read operations from this Neo4j instance.
• dbms.record_format: Database record format.
• dbms.relationship_grouping_threshold: Relationship count threshold for considering a node to be
dense.
• dbms.rest.transaction.idle_timeout: Timeout for idle transactions in the REST endpoint.
• dbms.security.allow_csv_import_from_file_urls: Determines if Cypher will allow using file URLs
when loading data using LOAD CSV.
• dbms.security.allow_publisher_create_token: Set to true if users with role publisher are allowed to
create new tokens.
• dbms.security.auth_cache_max_capacity: The maximum capacity for authentication and
authorization caches (respectively).
• dbms.security.auth_cache_ttl: The time to live (TTL) for cached authentication and authorization
info when using external auth providers (LDAP or plugin).
• dbms.security.auth_enabled: Enable auth requirement to access Neo4j.
• dbms.security.auth_provider: The authentication and authorization provider that contains both
the users and roles.
• dbms.security.ha_status_auth_enabled: Require authorization for access to the HA status
endpoints.
• dbms.security.http_authorization_classes: Comma-seperated list of custom security rules for
Neo4j to use.
• dbms.security.ldap.authentication.cache_enabled: Determines if the result of authentication via
the LDAP server should be cached or not.
• dbms.security.ldap.authentication.mechanism: LDAP authentication mechanism.
• dbms.security.ldap.authentication.user_dn_template: LDAP user DN template.
• dbms.security.ldap.authorization.group_membership_attributes: A list of attribute names on a
user object that contains groups to be used for mapping to roles when LDAP authorization is
enabled.
• dbms.security.ldap.authorization.group_to_role_mapping: An authorization mapping from LDAP
group names to Neo4j role names.
• dbms.security.ldap.authorization.system_password: An LDAP system account password to use for
authorization searches when dbms.security.ldap.authorization.use_system_account is true.
• dbms.security.ldap.authorization.system_username: An LDAP system account username to use for
authorization searches when dbms.security.ldap.authorization.use_system_account is true.
• dbms.security.ldap.authorization.use_system_account: Perform LDAP search for authorization info
using a system account instead of the user’s own account.
If this is set to false (default), the search for group membership will be performed directly after
authentication using the LDAP context bound with the user’s own account. *
dbms.security.ldap.authorization.user_search_base: The name of the base object or named context to
search for user objects when LDAP authorization is enabled. *
129
dbms.security.ldap.authorization.user_search_filter: The LDAP search filter to search for a user
principal when LDAP authorization is enabled. * dbms.security.ldap.connection_timeout: The timeout
for establishing an LDAP connection. * dbms.security.ldap.host: URL of LDAP server to use for
authentication and authorization. * dbms.security.ldap.read_timeout: The timeout for an LDAP read
request (i.e. * dbms.security.ldap.referral: The LDAP referral behavior when creating a connection. *
dbms.security.ldap.use_starttls: Use secure communication with the LDAP server using opportunistic
TLS. * dbms.security.log_successful_authentication: Set to log successful authentication events to the
security log. * dbms.security.procedures.default_allowed: The default role that can execute all
procedures and user-defined functions that are not covered by the dbms.security.procedures.roles
setting. * dbms.security.procedures.roles: This provides a finer level of control over which roles can
execute procedures than the dbms.security.procedures.default_allowed setting. *
dbms.shell.enabled: Enable a remote shell server which Neo4j Shell clients can log in to. *
dbms.shell.host: Remote host for shell. * dbms.shell.port: The port the shell will listen on. *
dbms.shell.read_only: Read only mode. * dbms.shell.rmi_name: The name of the shell. *
dbms.threads.worker_count: Number of Neo4j worker threads, your OS might enforce a lower limit
than the maximum value specified here. * dbms.transaction.timeout: The maximum time interval of a
transaction within which it should be completed. * dbms.tx_log.rotation.retention_policy: Make Neo4j
keep the logical transaction logs for being able to backup the database. * dbms.tx_log.rotation.size:
Specifies at which file size the logical log will auto-rotate. * dbms.udc.enabled: Enable the UDC
extension. * dbms.unmanaged_extension_classes: Comma-separated list of <classname>=<mount
point> for unmanaged extensions. * ha.allow_init_cluster: Whether to allow this instance to create a
cluster if unable to join. * ha.branched_data_copying_strategy: Strategy for how to order handling of
branched data on slaves and copying of the store from the master. * ha.branched_data_policy: Policy
for how to handle branched data. * ha.broadcast_timeout: Timeout for broadcasting values in cluster.
* ha.configuration_timeout: Timeout for waiting for configuration from an existing cluster member
during cluster join. * ha.data_chunk_size: Max size of the data chunks that flows between master and
slaves in HA. * ha.default_timeout: Default timeout used for clustering timeouts. *
ha.election_timeout: Timeout for waiting for other members to finish a role election. *
ha.heartbeat_interval: How often heartbeat messages should be sent. * ha.heartbeat_timeout: How
long to wait for heartbeats from other instances before marking them as suspects for failure. *
ha.host.coordination: Host and port to bind the cluster management communication. * ha.host.data:
Hostname and port to bind the HA server. * ha.initial_hosts: A comma-separated list of other
members of the cluster to join. * ha.internal_role_switch_timeout: Timeout for waiting for internal
conditions during state switch, like for transactions to complete, before switching to master or slave. *
ha.join_timeout: Timeout for joining a cluster. * ha.learn_timeout: Timeout for learning values. *
ha.leave_timeout: Timeout for waiting for cluster leave to finish. * ha.max_acceptors: Maximum
number of servers to involve when agreeing to membership changes. * ha.max_channels_per_slave:
Maximum number of connections a slave can have to the master. * ha.paxos_timeout: Default value
for all Paxos timeouts. * ha.phase1_timeout: Timeout for Paxos phase 1. * ha.phase2_timeout:
Timeout for Paxos phase 2. * ha.pull_batch_size: Size of batches of transactions applied on slaves
when pulling from master. * ha.pull_interval: Interval of pulling updates from master. *
ha.role_switch_timeout: Timeout for request threads waiting for instance to become master or slave. *
ha.server_id: Id for a cluster instance. * ha.slave_lock_timeout: Timeout for taking remote (write) locks
on slaves. * ha.slave_only: Whether this instance should only participate as slave in cluster. *
ha.slave_read_timeout: How long a slave will wait for response from master before giving up. *
ha.tx_push_factor: The amount of slaves the master will ask to replicate a committed transaction. *
ha.tx_push_strategy: Push strategy of a transaction to a slave during commit. *
metrics.bolt.messages.enabled: Enable reporting metrics about Bolt Protocol message processing. *
metrics.csv.enabled: Set to true to enable exporting metrics to CSV files. * metrics.csv.interval: The
reporting interval for the CSV files. * metrics.cypher.replanning.enabled: Enable reporting metrics
about number of occurred replanning events. * metrics.enabled: The default enablement value for all
the supported metrics. * metrics.graphite.enabled: Set to true to enable exporting metrics to
Graphite. * metrics.graphite.interval: The reporting interval for Graphite. * metrics.graphite.server:
The hostname or IP address of the Graphite server. * metrics.jvm.buffers.enabled: Enable reporting
metrics about the buffer pools. * metrics.jvm.gc.enabled: Enable reporting metrics about the duration
of garbage collections. * metrics.jvm.memory.enabled: Enable reporting metrics about the memory
usage. * metrics.jvm.threads.enabled: Enable reporting metrics about the current number of threads
running. * metrics.neo4j.causal_clustering.enabled: Enable reporting metrics about Causal Clustering
mode. * metrics.neo4j.checkpointing.enabled: Enable reporting metrics about Neo4j check pointing. *
130
metrics.neo4j.cluster.enabled: Enable reporting metrics about HA cluster info. *
metrics.neo4j.counts.enabled: Enable reporting metrics about approximately how many entities are in
the database. * metrics.neo4j.enabled: The default enablement value for all Neo4j specific support
metrics. * metrics.neo4j.logrotation.enabled: Enable reporting metrics about the Neo4j log rotation. *
metrics.neo4j.network.enabled: Enable reporting metrics about the network usage. *
metrics.neo4j.pagecache.enabled: Enable reporting metrics about the Neo4j page cache. *
metrics.neo4j.server.enabled: Enable reporting metrics about Server threading info. *
metrics.neo4j.tx.enabled: Enable reporting metrics about transactions. * metrics.prefix: A common
prefix for the reported metrics field names. * tools.consistency_checker.check_graph: Perform checks
between nodes, relationships, properties, types and tokens. *
tools.consistency_checker.check_indexes: Perform checks on indexes. *
tools.consistency_checker.check_label_scan_store: Perform checks on the label scan store. *
tools.consistency_checker.check_property_owners: Perform optional additional checking on property
ownership.
Deprecated settings
• dbms.index_sampling.buffer_size: Size of buffer used by index sampling.
Table 20. browser.allow_outgoing_connections
Description
Configure the policy for outgoing Neo4j Browser connections.
Valid values
browser.allow_outgoing_connections is a boolean
Default value
true
Table 21. browser.credential_timeout
Description
Configure the Neo4j Browser to time out logged in users after this idle period. Setting this to 0
indicates no limit.
Valid values
browser.credential_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
0
Table 22. browser.remote_content_hostname_whitelist
Description
Whitelist of hosts for the Neo4j Browser to be allowed to fetch content from.
Valid values
browser.remote_content_hostname_whitelist is a string
Default value
http://guides.neo4j.com,https://guides.neo4j.com,http://localhost,https://localhost
Table 23. browser.retain_connection_credentials
Description
Configure the Neo4j Browser to store or not store user credentials.
Valid values
browser.retain_connection_credentials is a boolean
Default value
true
Table 24. causal_clustering.array_block_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
ARRAY_BLOCK IDs. Larger values mean less frequent requests but also result in more unused
IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.array_block_id_allocation_size is an integer
Default value
1024
Table 25. causal_clustering.catchup_batch_size
Description
The maximum batch size when catching up (in unit of entries).
Valid values
causal_clustering.catchup_batch_size is an integer
Default value
64
131
Table 26. causal_clustering.cluster_allow_reads_on_followers
Description
Configure if the dbms.cluster.routing.getServers() procedure should include followers as
read endpoints or return only read replicas. If there are no read replicas in the cluster,
followers are returned as read end points regardless the value of this setting.
Valid values
causal_clustering.cluster_allow_reads_on_followers is a boolean
Default value
false
Table 27. causal_clustering.cluster_routing_ttl
Description
How long drivers should cache the data from the dbms.cluster.routing.getServers()
procedure.
Valid values
causal_clustering.cluster_routing_ttl is a duration (valid units are ms, s, m; default unit is s) which
is minimum 1000
Default value
300000
Table 28. causal_clustering.cluster_topology_refresh
Description
Time between scanning the cluster to refresh current server’s view of topology.
Valid values
causal_clustering.cluster_topology_refresh is a duration (valid units are ms, s, m; default unit is s)
which is minimum 1000
Default value
60000
Table 29. causal_clustering.disable_middleware_logging
Description
Prevents the network middleware from dumping its own logs. Defaults to true.
Valid values
causal_clustering.disable_middleware_logging is a boolean
Default value
true
Table 30. causal_clustering.discovery_advertised_address
Description
Advertised cluster member discovery management communication.
Valid values
an advertised socket address
Default value
localhost:5000
Table 31. causal_clustering.discovery_listen_address
Description
Host and port to bind the cluster member discovery management communication.
Valid values
a listen socket address
Default value
localhost:5000
Table 32. causal_clustering.expected_core_cluster_size
Description
Expected number of Core machines in the cluster.
Valid values
causal_clustering.expected_core_cluster_size is an integer
Default value
3
Table 33. causal_clustering.global_session_tracker_state_size
Description
The maximum file size before the global session tracker state file is rotated (in unit of entries).
Valid values
causal_clustering.global_session_tracker_state_size is an integer
Default value
1000
Table 34. causal_clustering.id_alloc_state_size
132
Description
The maximum file size before the ID allocation file is rotated (in unit of entries).
Valid values
causal_clustering.id_alloc_state_size is an integer
Default value
1000
Table 35. causal_clustering.initial_discovery_members
Description
A comma-separated list of other members of the cluster to join.
Valid values
causal_clustering.initial_discovery_members is a list separated by "," where items are an
advertised socket address
Mandatory
The causal_clustering.initial_discovery_members configuration setting is mandatory.
Table 36. causal_clustering.join_catch_up_timeout
Description
Time out for a new member to catch up.
Valid values
causal_clustering.join_catch_up_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
600000
Table 37. causal_clustering.label_token_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
LABEL_TOKEN IDs. Larger values mean less frequent requests but also result in more unused
IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.label_token_id_allocation_size is an integer
Default value
32
Table 38. causal_clustering.label_token_name_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
LABEL_TOKEN_NAME IDs. Larger values mean less frequent requests but also result in more
unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.label_token_name_id_allocation_size is an integer
Default value
1024
Table 39. causal_clustering.last_applied_state_size
Description
The maximum file size before the storage file is rotated (in unit of entries).
Valid values
causal_clustering.last_applied_state_size is an integer
Default value
1000
Table 40. causal_clustering.leader_election_timeout
Description
The time limit within which a new leader election will occur if no messages are received.
Valid values
causal_clustering.leader_election_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
7000
Table 41. causal_clustering.log_shipping_max_lag
Description
The maximum lag allowed before log shipping pauses (in unit of entries).
Valid values
causal_clustering.log_shipping_max_lag is an integer
Default value
256
Table 42. causal_clustering.neostore_block_id_allocation_size
133
Description
The size of the ID allocation requests Core servers will make when they run out of
NEOSTORE_BLOCK IDs. Larger values mean less frequent requests but also result in more
unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.neostore_block_id_allocation_size is an integer
Default value
1024
Table 43. causal_clustering.node_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of NODE IDs.
Larger values mean less frequent requests but also result in more unused IDs (and unused
disk space) in the event of a crash.
Valid values
causal_clustering.node_id_allocation_size is an integer
Default value
1024
Table 44. causal_clustering.node_labels_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
NODE_LABELS IDs. Larger values mean less frequent requests but also result in more unused
IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.node_labels_id_allocation_size is an integer
Default value
1024
Table 45. causal_clustering.outgoing_queue_size
Description
The number of messages waiting to be sent to other servers in the cluster.
Valid values
causal_clustering.outgoing_queue_size is an integer
Default value
64
Table 46. causal_clustering.property_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of PROPERTY
IDs. Larger values mean less frequent requests but also result in more unused IDs (and unused
disk space) in the event of a crash.
Valid values
causal_clustering.property_id_allocation_size is an integer
Default value
1024
Table 47. causal_clustering.property_key_token_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
PROPERTY_KEY_TOKEN IDs. Larger values mean less frequent requests but also result in more
unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.property_key_token_id_allocation_size is an integer
Default value
32
Table 48. causal_clustering.property_key_token_name_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
PROPERTY_KEY_TOKEN_NAME IDs. Larger values mean less frequent requests but also result in
more unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.property_key_token_name_id_allocation_size is an integer
Default value
1024
Table 49. causal_clustering.pull_interval
Description
Interval of pulling updates from cores.
Valid values
causal_clustering.pull_interval is a duration (valid units are ms, s, m; default unit is s)
134
Default value
1000
Table 50. causal_clustering.raft_advertised_address
Description
Advertised hostname/IP address and port for the RAFT server.
Valid values
an advertised socket address
Default value
localhost:7000
Table 51. causal_clustering.raft_listen_address
Description
Network interface and port for the RAFT server to listen on.
Valid values
a listen socket address
Default value
localhost:7000
Table 52. causal_clustering.raft_log_implementation
Description
RAFT log implementation.
Valid values
causal_clustering.raft_log_implementation is a string
Default value
SEGMENTED
Table 53. causal_clustering.raft_log_prune_strategy
Description
RAFT log pruning strategy.
Valid values
causal_clustering.raft_log_prune_strategy is a string
Default value
1g size
Table 54. causal_clustering.raft_log_pruning_frequency
Description
RAFT log pruning frequency.
Valid values
causal_clustering.raft_log_pruning_frequency is a duration (valid units are ms, s, m; default unit
is s)
Default value
600000
Table 55. causal_clustering.raft_log_reader_pool_size
Description
RAFT log reader pool size.
Valid values
causal_clustering.raft_log_reader_pool_size is an integer
Default value
8
Table 56. causal_clustering.raft_log_rotation_size
Description
RAFT log rotation size.
Valid values
causal_clustering.raft_log_rotation_size is a byte size (valid multipliers are k, m, g, K, M, G) which is
minimum 1024
Default value
262144000
Table 57. causal_clustering.raft_membership_state_size
Description
The maximum file size before the membership state file is rotated (in unit of entries).
Valid values
causal_clustering.raft_membership_state_size is an integer
Default value
1000
Table 58. causal_clustering.raft_messages_log_enable
135
Description
Enable or disable the dump of all network messages pertaining to the RAFT protocol.
Valid values
causal_clustering.raft_messages_log_enable is a boolean
Default value
false
Table 59. causal_clustering.raft_term_state_size
Description
The maximum file size before the term state file is rotated (in unit of entries).
Valid values
causal_clustering.raft_term_state_size is an integer
Default value
1000
Table 60. causal_clustering.raft_vote_state_size
Description
The maximum file size before the vote state file is rotated (in unit of entries).
Valid values
causal_clustering.raft_vote_state_size is an integer
Default value
1000
Table 61. causal_clustering.read_replica_refresh_rate
Description
Read replica 'call home' frequency.
Valid values
causal_clustering.read_replica_refresh_rate is a duration (valid units are ms, s, m; default unit is
s) which is minimum 5000
Default value
5000
Table 62. causal_clustering.read_replica_time_to_live
Description
Time To Live before read replica is considered unavailable.
Valid values
causal_clustering.read_replica_time_to_live is a duration (valid units are ms, s, m; default unit is s)
which is minimum 60000
Default value
60000
Table 63. causal_clustering.relationship_group_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
RELATIONSHIP_GROUP IDs. Larger values mean less frequent requests but also result in more
unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.relationship_group_id_allocation_size is an integer
Default value
1024
Table 64. causal_clustering.relationship_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
RELATIONSHIP IDs. Larger values mean less frequent requests but also result in more unused
IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.relationship_id_allocation_size is an integer
Default value
1024
Table 65. causal_clustering.relationship_type_token_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
RELATIONSHIP_TYPE_TOKEN IDs. Larger values mean less frequent requests but also result in
more unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.relationship_type_token_id_allocation_size is an integer
Default value
32
Table 66. causal_clustering.relationship_type_token_name_id_allocation_size
136
Description
The size of the ID allocation requests Core servers will make when they run out of
RELATIONSHIP_TYPE_TOKEN_NAME IDs. Larger values mean less frequent requests but also
result in more unused IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.relationship_type_token_name_id_allocation_size is an integer
Default value
1024
Table 67. causal_clustering.replicated_lock_token_state_size
Description
The maximum file size before the replicated lock token state file is rotated (in unit of entries).
Valid values
causal_clustering.replicated_lock_token_state_size is an integer
Default value
1000
Table 68. causal_clustering.schema_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of SCHEMA IDs.
Larger values mean less frequent requests but also result in more unused IDs (and unused
disk space) in the event of a crash.
Valid values
causal_clustering.schema_id_allocation_size is an integer
Default value
1024
Table 69. causal_clustering.state_machine_apply_max_batch_size
Description
The maximum number of operations to be batched during applications of operations in the
state machines.
Valid values
causal_clustering.state_machine_apply_max_batch_size is an integer
Default value
16
Table 70. causal_clustering.state_machine_flush_window_size
Description
The number of operations to be processed before the state machines flush to disk.
Valid values
causal_clustering.state_machine_flush_window_size is an integer
Default value
4096
Table 71. causal_clustering.string_block_id_allocation_size
Description
The size of the ID allocation requests Core servers will make when they run out of
STRING_BLOCK IDs. Larger values mean less frequent requests but also result in more unused
IDs (and unused disk space) in the event of a crash.
Valid values
causal_clustering.string_block_id_allocation_size is an integer
Default value
1024
Table 72. causal_clustering.transaction_advertised_address
Description
Advertised hostname/IP address and port for the transaction shipping server.
Valid values
an advertised socket address
Default value
localhost:6000
Table 73. causal_clustering.transaction_listen_address
Description
Network interface and port for the transaction shipping server to listen on.
Valid values
a listen socket address
Default value
localhost:6000
Table 74. causal_clustering.unknown_address_logging_throttle
137
Description
Throttle limit for logging unknown cluster member address.
Valid values
causal_clustering.unknown_address_logging_throttle is a duration (valid units are ms, s, m;
default unit is s)
Default value
10000
Table 75. cypher.default_language_version
Description
Set this to specify the default parser (language version).
Valid values
cypher.default_language_version is one of 2.3, 3.0, 3.1, default
Default value
default
Table 76. cypher.forbid_exhaustive_shortestpath
Description
This setting is associated with performance optimization. Set this to true in situations where it
is preferable to have any queries using the 'shortestPath' function terminate as soon as
possible with no answer, rather than potentially running for a long time attempting to find an
answer (even if there is no path to be found). For most queries, the 'shortestPath' algorithm
will return the correct answer very quickly. However there are some cases where it is possible
that the fast bidirectional breadth-first search algorithm will find no results even if they exist.
This can happen when the predicates in the WHERE clause applied to 'shortestPath' cannot be
applied to each step of the traversal, and can only be applied to the entire path. When the
query planner detects these special cases, it will plan to perform an exhaustive depth-first
search if the fast algorithm finds no paths. However, the exhaustive search may be orders of
magnitude slower than the fast algorithm. If it is critical that queries terminate as soon as
possible, it is recommended that this option be set to true, which means that Neo4j will never
consider using the exhaustive search for shortestPath queries. However, please note that if no
paths are found, an error will be thrown at run time, which will need to be handled by the
application.
Valid values
cypher.forbid_exhaustive_shortestpath is a boolean
Default value
false
Table 77. cypher.hints_error
Description
Set this to specify the behavior when Cypher planner or runtime hints cannot be fulfilled. If
true, then non-conformance will result in an error, otherwise only a warning is generated.
Valid values
cypher.hints_error is a boolean
Default value
false
Table 78. cypher.min_replan_interval
Description
The minimum lifetime of a query plan before a query is considered for replanning.
Valid values
cypher.min_replan_interval is a duration (valid units are ms, s, m; default unit is s)
Default value
10000
Table 79. cypher.planner
Description
Set this to specify the default planner for the default language version.
Valid values
cypher.planner is one of COST, RULE, default
Default value
default
Table 80. cypher.statistics_divergence_threshold
Description
The threshold when a plan is considered stale. If any of the underlying statistics used to create
the plan has changed more than this value, the plan is considered stale and will be replanned.
A value of 0 means always replan, and 1 means never replan.
Valid values
cypher.statistics_divergence_threshold is a double which is minimum 0.0, and is maximum 1.0
Default value
0.75
138
Table 81. dbms.active_database
Description
Name of the database to load.
Valid values
dbms.active_database is a string
Default value
graph.db
Table 82. dbms.allow_format_migration
Description
Whether to allow a store upgrade in case the current version of the database starts against an
older store version. Setting this to true does not guarantee successful upgrade, it just allows an
upgrade to be performed.
Valid values
dbms.allow_format_migration is a boolean
Default value
false
Table 83. dbms.backup.address
Description
Listening server for online backups.
Valid values
dbms.backup.address is a hostname and port
Default value
127.0.0.1:6362-6372
Table 84. dbms.backup.enabled
Description
Enable support for running online backups.
Valid values
dbms.backup.enabled is a boolean
Default value
true
Table 85. dbms.checkpoint.interval.time
Description
Configures the time interval between check-points. The database will not check-point more
often than this (unless check pointing is triggered by a different event), but might check-point
less often than this interval, if performing a check-point takes longer time than the configured
interval. A check-point is a point in the transaction logs, from which recovery would start from.
Longer check-point intervals typically means that recovery will take longer to complete in case
of a crash. On the other hand, a longer check-point interval can also reduce the I/O load that
the database places on the system, as each check-point implies a flushing and forcing of all the
store files.
Valid values
dbms.checkpoint.interval.time is a duration (valid units are ms, s, m; default unit is s)
Default value
300000
Table 86. dbms.checkpoint.interval.tx
Description
Configures the transaction interval between check-points. The database will not check-point
more often than this (unless check pointing is triggered by a different event), but might checkpoint less often than this interval, if performing a check-point takes longer time than the
configured interval. A check-point is a point in the transaction logs, from which recovery would
start from. Longer check-point intervals typically means that recovery will take longer to
complete in case of a crash. On the other hand, a longer check-point interval can also reduce
the I/O load that the database places on the system, as each check-point implies a flushing and
forcing of all the store files. The default is '100000' for a check-point every 100000
transactions.
Valid values
dbms.checkpoint.interval.tx is an integer which is minimum 1
Default value
100000
Table 87. dbms.checkpoint.iops.limit
139
Description
Limit the number of IOs the background checkpoint process will consume per second. This
setting is advisory, is ignored in Neo4j Community Edition, and is followed to best effort in
Enterprise Edition. An IO is in this case a 8 KiB (mostly sequential) write. Limiting the write IO in
this way will leave more bandwidth in the IO subsystem to service random-read IOs, which is
important for the response time of queries when the database cannot fit entirely in memory.
The only drawback of this setting is that longer checkpoint times may lead to slightly longer
recovery times in case of a database or system crash. A lower number means lower IO
pressure, and consequently longer checkpoint times. The configuration can also be
commented out to remove the limitation entirely, and let the checkpointer flush data as fast as
the hardware will go. Set this to -1 to disable the IOPS limit.
Valid values
dbms.checkpoint.iops.limit is an integer
Default value
1000
Table 88. dbms.connectors.default_advertised_address
Description
Default hostname or IP address the server uses to advertise itself to its connectors. To
advertise a specific hostname or IP address for a specific connector, specify the
advertised_address property for the specific connector.
Valid values
dbms.connectors.default_advertised_address is a string
Default value
localhost
Table 89. dbms.connectors.default_listen_address
Description
Default network interface to listen for incoming connections. To listen for connections on all
interfaces, use "0.0.0.0". To bind specific connectors to a specific network interfaces, specify
the listen_address properties for the specific connector.
Valid values
dbms.connectors.default_listen_address is a string
Default value
localhost
Table 90. dbms.directories.certificates
Description
Directory for storing certificates to be used by Neo4j for TLS connections. Certificate files must
be named neo4j.cert and neo4j.key.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
certificates
Table 91. dbms.directories.data
Description
Path of the data directory. You must not configure more than one Neo4j installation to use the
same data directory.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
data
Table 92. dbms.directories.import
Description
Sets the root directory for file URLs used with the Cypher LOAD CSV clause. This must be set to a
single directory, restricting access to only those files within that directory and its
subdirectories.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Table 93. dbms.directories.lib
Description
Path of the lib directory.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
lib
Table 94. dbms.directories.logs
Description
Path of the logs directory.
140
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
logs
Table 95. dbms.directories.metrics
Description
The target location of the CSV files: a path to a directory wherein a CSV file per reported field
will be written.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
metrics
Table 96. dbms.directories.plugins
Description
Location of the database plugin directory. Compiled Java JAR files that contain database
procedures will be loaded if they are placed in this directory.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
plugins
Table 97. dbms.directories.run
Description
Path of the run directory. This directory holds Neo4j’s runtime state, such as a pidfile when it is
running in the background. The pidfile is created when starting neo4j and removed when
stopping it. It may be placed on an in-memory filesystem such as tmpfs.
Valid values
A filesystem path; relative paths are resolved against the installation root, <neo4j-home>
Default value
run
Table 98. dbms.ids.reuse.types.override
Description
Specified names of id types (comma separated) that should be reused. Currently only 'node'
and 'relationship' types are supported.
Valid values
dbms.ids.reuse.types.override is a list separated by "," where items are one of NODE,
RELATIONSHIP
Default value
[RELATIONSHIP, NODE]
Table 99. dbms.index_sampling.background_enabled
Description
Enable or disable background index sampling.
Valid values
dbms.index_sampling.background_enabled is a boolean
Default value
true
Table 100. dbms.index_sampling.buffer_size
Description
Size of buffer used by index sampling. This configuration setting is no longer applicable as
from Neo4j 3.0.3. Please use dbms.index_sampling.sample_size_limit instead.
Valid values
dbms.index_sampling.buffer_size is a byte size (valid multipliers are k, m, g, K, M, G) which is
minimum 1048576, and is maximum 2147483647
Default value
67108864
Deprecated
The dbms.index_sampling.buffer_size configuration setting has been deprecated.
Table 101. dbms.index_sampling.sample_size_limit
Description
Index sampling chunk size limit.
Valid values
dbms.index_sampling.sample_size_limit is an integer which is minimum 1048576, and is
maximum 2147483647
Default value
8388608
Table 102. dbms.index_sampling.update_percentage
141
Description
Percentage of index updates of total index size required before sampling of a given index is
triggered.
Valid values
dbms.index_sampling.update_percentage is an integer which is minimum 0
Default value
5
Table 103. dbms.index_searcher_cache_size
Description
The maximum number of open Lucene index searchers.
Valid values
dbms.index_searcher_cache_size is an integer which is minimum 1
Default value
2147483647
Table 104. dbms.logs.debug.level
Description
Debug log level threshold.
Valid values
dbms.logs.debug.level is one of DEBUG, INFO, WARN, ERROR, NONE
Default value
INFO
Table 105. dbms.logs.debug.rotation.delay
Description
Minimum time interval after last rotation of the debug log before it may be rotated again.
Valid values
dbms.logs.debug.rotation.delay is a duration (valid units are ms, s, m; default unit is s)
Default value
300000
Table 106. dbms.logs.debug.rotation.keep_number
Description
Maximum number of history files for the debug log.
Valid values
dbms.logs.debug.rotation.keep_number is an integer which is minimum 1
Default value
7
Table 107. dbms.logs.debug.rotation.size
Description
Threshold for rotation of the debug log.
Valid values
dbms.logs.debug.rotation.size is a byte size (valid multipliers are k, m, g, K, M, G) which is
minimum 0, and is maximum 9223372036854775807
Default value
20971520
Table 108. dbms.logs.gc.enabled
Description
Enable GC Logging.
Valid values
dbms.logs.gc.enabled is a boolean
Default value
false
Table 109. dbms.logs.gc.options
Description
GC Logging Options.
Valid values
dbms.logs.gc.options is a string
Default value
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime
-XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution
Table 110. dbms.logs.gc.rotation.keep_number
Description
Number of GC logs to keep.
Valid values
dbms.logs.gc.rotation.keep_number is an integer
142
Default value
5
Table 111. dbms.logs.gc.rotation.size
Description
Size of each GC log that is kept.
Valid values
dbms.logs.gc.rotation.size is a byte size (valid multipliers are k, m, g, K, M, G) which is minimum 0,
and is maximum 9223372036854775807
Default value
20971520
Table 112. dbms.logs.http.enabled
Description
Enable HTTP request logging.
Valid values
dbms.logs.http.enabled is a boolean
Default value
false
Table 113. dbms.logs.http.rotation.keep_number
Description
Number of HTTP logs to keep.
Valid values
dbms.logs.http.rotation.keep_number is an integer
Default value
5
Table 114. dbms.logs.http.rotation.size
Description
Size of each HTTP log that is kept.
Valid values
dbms.logs.http.rotation.size is a byte size (valid multipliers are k, m, g, K, M, G) which is minimum
0, and is maximum 9223372036854775807
Default value
20971520
Table 115. dbms.logs.query.enabled
Description
Log executed queries that take longer than the configured threshold,
dbms.logs.query.threshold. Log entries are written to the file query.log located in the Logs
directory. For location of the Logs directory, see File locations. This feature is available in the
Neo4j Enterprise Edition.
Valid values
dbms.logs.query.enabled is a boolean
Default value
false
Table 116. dbms.logs.query.parameter_logging_enabled
Description
Log parameters for executed queries that took longer than the configured threshold.
Valid values
dbms.logs.query.parameter_logging_enabled is a boolean
Default value
true
Table 117. dbms.logs.query.rotation.keep_number
Description
Maximum number of history files for the query log.
Valid values
dbms.logs.query.rotation.keep_number is an integer which is minimum 1
Default value
7
Table 118. dbms.logs.query.rotation.size
Description
The file size in bytes at which the query log will auto-rotate. If set to zero then no rotation will
occur. Accepts a binary suffix k, m or g.
Valid values
dbms.logs.query.rotation.size is a byte size (valid multipliers are k, m, g, K, M, G) which is
minimum 0, and is maximum 9223372036854775807
143
Default value
20971520
Table 119. dbms.logs.query.threshold
Description
If the execution of query takes more time than this threshold, the query is logged - provided
query logging is enabled. Defaults to 0 seconds, that is all queries are logged.
Valid values
dbms.logs.query.threshold is a duration (valid units are ms, s, m; default unit is s)
Default value
0
Table 120. dbms.logs.security.level
Description
Security log level threshold.
Valid values
dbms.logs.security.level is one of DEBUG, INFO, WARN, ERROR, NONE
Default value
INFO
Table 121. dbms.logs.security.rotation.delay
Description
Minimum time interval after last rotation of the security log before it may be rotated again.
Valid values
dbms.logs.security.rotation.delay is a duration (valid units are ms, s, m; default unit is s)
Default value
300000
Table 122. dbms.logs.security.rotation.keep_number
Description
Maximum number of history files for the security log.
Valid values
dbms.logs.security.rotation.keep_number is an integer which is minimum 1
Default value
7
Table 123. dbms.logs.security.rotation.size
Description
Threshold for rotation of the security log.
Valid values
dbms.logs.security.rotation.size is a byte size (valid multipliers are k, m, g, K, M, G) which is
minimum 0, and is maximum 9223372036854775807
Default value
20971520
Table 124. dbms.memory.pagecache.size
Description
The amount of memory to use for mapping the store files, in bytes (or kilobytes with the 'k'
suffix, megabytes with 'm' and gigabytes with 'g'). If Neo4j is running on a dedicated server,
then it is generally recommended to leave about 2-4 gigabytes for the operating system, give
the JVM enough heap to hold all your transaction state and query context, and then leave the
rest for the page cache. If no page cache memory is configured, then a heuristic setting is
computed based on available system resources.
Valid values
dbms.memory.pagecache.size is a byte size (valid multipliers are k, m, g, K, M, G) which is
minimum 245760
Table 125. dbms.memory.pagecache.swapper
Description
Specify which page swapper to use for doing paged IO. This is only used when integrating with
proprietary storage technology.
Valid values
dbms.memory.pagecache.swapper is a string
Table 126. dbms.mode
Description
Configure the operating mode of the database — 'SINGLE' for stand-alone operation, 'HA' for
operating as a member in a cluster, 'ARBITER' for an HA-only cluster member with no database,
CORE for a core member of a Causal Clustering cluster, or READ_REPLICA for read replica.
cluster.
Valid values
dbms.mode is a string
144
Default value
SINGLE
Table 127. dbms.query_cache_size
Description
The number of Cypher query execution plans that are cached.
Valid values
dbms.query_cache_size is an integer which is minimum 0
Default value
1000
Table 128. dbms.read_only
Description
Only allow read operations from this Neo4j instance. This mode still requires write access to
the directory for lock purposes.
Valid values
dbms.read_only is a boolean
Default value
false
Table 129. dbms.record_format
Description
Database record format. Enterprise edition only. Valid values: standard, high_limit. Default
value: standard.
Valid values
dbms.record_format is a string
Default value
Table 130. dbms.relationship_grouping_threshold
Description
Relationship count threshold for considering a node to be dense.
Valid values
dbms.relationship_grouping_threshold is an integer which is minimum 1
Default value
50
Table 131. dbms.rest.transaction.idle_timeout
Description
Timeout for idle transactions in the REST endpoint.
Valid values
dbms.rest.transaction.idle_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
60000
Table 132. dbms.security.allow_csv_import_from_file_urls
Description
Determines if Cypher will allow using file URLs when loading data using LOAD CSV. Setting this
value to false will cause Neo4j to fail LOAD CSV clauses that load data from the file system.
Valid values
dbms.security.allow_csv_import_from_file_urls is a boolean
Default value
true
Table 133. dbms.security.allow_publisher_create_token
Description
Set to true if users with role publisher are allowed to create new tokens.
Valid values
dbms.security.allow_publisher_create_token is a boolean
Default value
false
Table 134. dbms.security.auth_cache_max_capacity
Description
The maximum capacity for authentication and authorization caches (respectively).
Valid values
dbms.security.auth_cache_max_capacity is an integer
Default value
10000
Table 135. dbms.security.auth_cache_ttl
145
Description
The time to live (TTL) for cached authentication and authorization info when using external
auth providers (LDAP or plugin). Setting the TTL to 0 will disable auth caching. Disabling
caching while using the LDAP auth provider requires the use of an LDAP system account for
resolving authorization information.
Valid values
dbms.security.auth_cache_ttl is a duration (valid units are ms, s, m; default unit is s)
Default value
600000
Table 136. dbms.security.auth_enabled
Description
Enable auth requirement to access Neo4j.
Valid values
dbms.security.auth_enabled is a boolean
Default value
false
Table 137. dbms.security.auth_provider
Description
The authentication and authorization provider that contains both the users and roles. This can
be one of the built-in native or ldap providers, or it can be an externally provided plugin, with a
custom name prefixed by plugin-, i.e. plugin-<AUTH_PROVIDER_NAME>.
Valid values
dbms.security.auth_provider is a string
Default value
native
Table 138. dbms.security.ha_status_auth_enabled
Description
Require authorization for access to the HA status endpoints.
Valid values
dbms.security.ha_status_auth_enabled is a boolean
Default value
true
Table 139. dbms.security.http_authorization_classes
Description
Comma-seperated list of custom security rules for Neo4j to use.
Valid values
dbms.security.http_authorization_classes is a list separated by "," where items are a string
Default value
[]
Table 140. dbms.security.ldap.authentication.cache_enabled
Description
Determines if the result of authentication via the LDAP server should be cached or not.
Caching is used to limit the number of LDAP requests that have to be made over the network
for users that have already been authenticated successfully. A user can be authenticated
against an existing cache entry (instead of via an LDAP server) as long as it is alive (see
dbms.security.auth_cache_ttl). An important consequence of setting this to true is that Neo4j
then needs to cache a hashed version of the credentials in order to perform credentials
matching. This hashing is done using a cryptographic hash function together with a random
salt. Preferably a conscious decision should be made if this method is considered acceptable
by the security standards of the organization in which this Neo4j instance is deployed.
Valid values
dbms.security.ldap.authentication.cache_enabled is a boolean
Default value
true
Table 141. dbms.security.ldap.authentication.mechanism
Description
LDAP authentication mechanism. This is one of simple or a SASL mechanism supported by
JNDI, for example DIGEST-MD5. simple is basic username and password authentication and SASL
is used for more advanced mechanisms. See RFC 2251 LDAPv3 documentation for more
details.
Valid values
dbms.security.ldap.authentication.mechanism is a string
Default value
simple
Table 142. dbms.security.ldap.authentication.user_dn_template
146
Description
LDAP user DN template. An LDAP object is referenced by its distinguished name (DN), and a
user DN is an LDAP fully-qualified unique user identifier. This setting is used to generate an
LDAP DN that conforms with the LDAP directory’s schema from the user principal that is
submitted with the authentication token when logging in. The special token {0} is a placeholder
where the user principal will be substituted into the DN string.
Valid values
dbms.security.ldap.authentication.user_dn_template is a string
Default value
uid={0},ou=users,dc=example,dc=com
Table 143. dbms.security.ldap.authorization.group_membership_attributes
Description
A list of attribute names on a user object that contains groups to be used for mapping to roles
when LDAP authorization is enabled.
Valid values
dbms.security.ldap.authorization.group_membership_attributes is a list separated by "," where
items are a string
Default value
[memberOf]
Table 144. dbms.security.ldap.authorization.group_to_role_mapping
Description
An authorization mapping from LDAP group names to Neo4j role names. The map should
be formatted as a semicolon separated list of key-value pairs, where the key is the LDAP
group name and the value is a comma separated list of corresponding role names. For
example: group1=role1;group2=role2;group3=role3,role4,role5 You could also use
whitespaces and quotes around group names to make this mapping more readable, for
example: dbms.security.ldap.authorization.group_to_role_mapping=\
"cn=Neo4j Read
Only,cn=users,dc=example,dc=com"
= reader; \
"cn=Neo4j ReadWrite,cn=users,dc=example,dc=com" = publisher; \
"cn=Neo4j Schema
Manager,cn=users,dc=example,dc=com" = architect; \
"cn=Neo4j
Administrator,cn=users,dc=example,dc=com" = admin.
Valid values
dbms.security.ldap.authorization.group_to_role_mapping is a string
Table 145. dbms.security.ldap.authorization.system_password
Description
An LDAP system account password to use for authorization searches when
dbms.security.ldap.authorization.use_system_account is true.
Valid values
dbms.security.ldap.authorization.system_password is a string
Table 146. dbms.security.ldap.authorization.system_username
Description
An LDAP system account username to use for authorization searches when
dbms.security.ldap.authorization.use_system_account is true. Note that the
dbms.security.ldap.authentication.user_dn_template will not be applied to this username, so
you may have to specify a full DN.
Valid values
dbms.security.ldap.authorization.system_username is a string
Table 147. dbms.security.ldap.authorization.use_system_account
Description
Perform LDAP search for authorization info using a system account instead of the user’s
own account. If this is set to false (default), the search for group membership will be
performed directly after authentication using the LDAP context bound with the user’s own
account. The mapped roles will be cached for the duration of dbms.security.auth_cache_ttl,
and then expire, requiring re-authentication. To avoid frequently having to re-authenticate
sessions you may want to set a relatively long auth cache expiration time together with this
option. NOTE: This option will only work if the users are permitted to search for their own
group membership attributes in the directory. If this is set to true, the search will be
performed using a special system account user with read access to all the users in the
directory. You need to specify the username and password using the settings
dbms.security.ldap.authorization.system_username and
dbms.security.ldap.authorization.system_password with this option. Note that this account
only needs read access to the relevant parts of the LDAP directory and does not need to
have access rights to Neo4j, or any other systems.
Valid values
dbms.security.ldap.authorization.use_system_account is a boolean
Default value
false
Table 148. dbms.security.ldap.authorization.user_search_base
147
Description
The name of the base object or named context to search for user objects when LDAP
authorization is enabled. A common case is that this matches the last part of
dbms.security.ldap.authentication.user_dn_template.
Valid values
dbms.security.ldap.authorization.user_search_base is a string
Default value
ou=users,dc=example,dc=com
Table 149. dbms.security.ldap.authorization.user_search_filter
Description
The LDAP search filter to search for a user principal when LDAP authorization is enabled. The
filter should contain the placeholder token {0} which will be substituted for the user principal.
Valid values
dbms.security.ldap.authorization.user_search_filter is a string
Default value
(&(objectClass=*)(uid={0}))
Table 150. dbms.security.ldap.connection_timeout
Description
The timeout for establishing an LDAP connection. If a connection with the LDAP server cannot
be established within the given time the attempt is aborted. A value of 0 means to use the
network protocol’s (i.e., TCP’s) timeout value.
Valid values
dbms.security.ldap.connection_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
30000
Table 151. dbms.security.ldap.host
Description
URL of LDAP server to use for authentication and authorization. The format of the setting is
<protocol>://<hostname>:<port>, where hostname is the only required field. The supported
values for protocol are ldap (default) and ldaps. The default port for ldap is 389 and for ldaps
636. For example: ldaps://ldap.example.com:10389. NOTE: You may want to consider using
STARTTLS (dbms.security.ldap.use_starttls) instead of LDAPS for secure connections, in
which case the correct protocol is ldap.
Valid values
dbms.security.ldap.host is a string
Default value
localhost
Table 152. dbms.security.ldap.read_timeout
Description
The timeout for an LDAP read request (i.e. search). If the LDAP server does not respond within
the given time the request will be aborted. A value of 0 means wait for a response indefinitely.
Valid values
dbms.security.ldap.read_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
30000
Table 153. dbms.security.ldap.referral
Description
The LDAP referral behavior when creating a connection. This is one of follow, ignore or throw. *
follow automatically follows any referrals * ignore ignores any referrals * throw throws an
exception, which will lead to authentication failure.
Valid values
dbms.security.ldap.referral is a string
Default value
follow
Table 154. dbms.security.ldap.use_starttls
Description
Use secure communication with the LDAP server using opportunistic TLS. First an initial
insecure connection will be made with the LDAP server, and a STARTTLS command will be
issued to negotiate an upgrade of the connection to TLS before initiating authentication.
Valid values
dbms.security.ldap.use_starttls is a boolean
Default value
false
Table 155. dbms.security.log_successful_authentication
148
Description
Set to log successful authentication events to the security log. If this is set to false only failed
authentication events will be logged, which could be useful if you find that the successful
events spam the logs too much, and you do not require full auditing capability.
Valid values
dbms.security.log_successful_authentication is a boolean
Default value
true
Table 156. dbms.security.procedures.default_allowed
Description
The default role that can execute all procedures and user-defined functions that are not
covered by the dbms.security.procedures.roles setting. If the
dbms.security.procedures.default_allowed setting is the empty string (default), procedures will
be executed according to the same security rules as normal Cypher statements.
Valid values
dbms.security.procedures.default_allowed is a string
Default value
Table 157. dbms.security.procedures.roles
Description
This provides a finer level of control over which roles can execute procedures than the
dbms.security.procedures.default_allowed setting. For example:
dbms.security.procedures.roles`=apoc.convert.:reader;apoc.load.json:writer;apoc.trigger.
add:TriggerHappy` will allow the role reader to execute all procedures in the
apoc.convert namespace, the role writer to execute all procedures in the apoc.load
namespace that starts with json and the role TriggerHappy to execute the specific
procedure apoc.trigger.add. Procedures not matching any of these patterns will be
subject to the dbms.security.procedures.default_allowed`` setting.
Valid values
dbms.security.procedures.roles is a string
Default value
Table 158. dbms.shell.enabled
Description
Enable a remote shell server which Neo4j Shell clients can log in to.
Valid values
dbms.shell.enabled is a boolean
Default value
false
Table 159. dbms.shell.host
Description
Remote host for shell. By default, the shell server listens only on the loopback interface, but
you can specify the IP address of any network interface or use 0.0.0.0 for all interfaces.
Valid values
dbms.shell.host is a string which must be a valid name
Default value
127.0.0.1
Table 160. dbms.shell.port
Description
The port the shell will listen on.
Valid values
dbms.shell.port is an integer which must be a valid port number (is in the range 0 to 65535)
Default value
1337
Table 161. dbms.shell.read_only
Description
Read only mode. Will only allow read operations.
Valid values
dbms.shell.read_only is a boolean
Default value
false
Table 162. dbms.shell.rmi_name
Description
The name of the shell.
Valid values
dbms.shell.rmi_name is a string which must be a valid name
149
Default value
shell
Table 163. dbms.threads.worker_count
Description
Number of Neo4j worker threads, your OS might enforce a lower limit than the maximum
value specified here.
Valid values
dbms.threads.worker_count is an integer which is in the range 1 to 44738
Default value
The minimum between "number of processors" and 500
Table 164. dbms.transaction.timeout
Description
The maximum time interval of a transaction within which it should be completed.
Valid values
dbms.transaction.timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
0
Table 165. dbms.tx_log.rotation.retention_policy
Description
Make Neo4j keep the logical transaction logs for being able to backup the database. Can be
used for specifying the threshold to prune logical logs after. For example "10 days" will prune
logical logs that only contains transactions older than 10 days from the current time, or "100k
txs" will keep the 100k latest transactions and prune any older transactions.
Valid values
dbms.tx_log.rotation.retention_policy is a string which must be true/false or of format
'<number><optional unit> <type>' for example 100M size for limiting logical log space on disk
to 100Mb, or 200k txs for limiting the number of transactions to keep to 200 000
Default value
7 days
Table 166. dbms.tx_log.rotation.size
Description
Specifies at which file size the logical log will auto-rotate. 0 means that no rotation will
automatically occur based on file size.
Valid values
dbms.tx_log.rotation.size is a byte size (valid multipliers are k, m, g, K, M, G) which is minimum
1048576
Default value
262144000
Table 167. dbms.udc.enabled
Description
Enable the UDC extension.
Valid values
dbms.udc.enabled is a boolean
Default value
true
Table 168. dbms.unmanaged_extension_classes
Description
Comma-separated list of <classname>=<mount point> for unmanaged extensions.
Valid values
dbms.unmanaged_extension_classes is a comma-seperated list of <classname>=<mount
point> strings
Default value
[]
Table 169. ha.allow_init_cluster
Description
Whether to allow this instance to create a cluster if unable to join.
Valid values
ha.allow_init_cluster is a boolean
Default value
true
Table 170. ha.branched_data_copying_strategy
150
Description
Strategy for how to order handling of branched data on slaves and copying of the store from
the master. The default is copy_then_branch, which, when combined with the keep_last or
keep_none branch handling strategies results in a safer branching strategy, as there is always a
store present so store failure to copy a store (for example, because of network failure) does
not leave the instance without a store.
Valid values
ha.branched_data_copying_strategy is one of branch_then_copy, copy_then_branch
Default value
branch_then_copy
Table 171. ha.branched_data_policy
Description
Policy for how to handle branched data.
Valid values
ha.branched_data_policy is one of keep_all, keep_last, keep_none
Default value
keep_all
Table 172. ha.broadcast_timeout
Description
Timeout for broadcasting values in cluster. Must consider end-to-end duration of Paxos
algorithm. This value is the default value for the ha.join_timeout and ha.leave_timeout
settings.
Valid values
ha.broadcast_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
30000
Table 173. ha.configuration_timeout
Description
Timeout for waiting for configuration from an existing cluster member during cluster join.
Valid values
ha.configuration_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
1000
Table 174. ha.data_chunk_size
Description
Max size of the data chunks that flows between master and slaves in HA. Bigger size may
increase throughput, but may also be more sensitive to variations in bandwidth, whereas lower
size increases tolerance for bandwidth variations.
Valid values
ha.data_chunk_size is a byte size (valid multipliers are k, m, g, K, M, G) which is minimum 1024
Default value
2097152
Table 175. ha.default_timeout
Description
Default timeout used for clustering timeouts. Override specific timeout settings with proper
values if necessary. This value is the default value for the ha.heartbeat_interval,
ha.paxos_timeout and ha.learn_timeout settings.
Valid values
ha.default_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
Table 176. ha.election_timeout
Description
Timeout for waiting for other members to finish a role election. Defaults to ha.paxos_timeout.
Valid values
ha.election_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
Table 177. ha.heartbeat_interval
Description
How often heartbeat messages should be sent. Defaults to ha.default_timeout.
Valid values
ha.heartbeat_interval is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
151
Table 178. ha.heartbeat_timeout
Description
How long to wait for heartbeats from other instances before marking them as suspects for
failure. This value reflects considerations of network latency, expected duration of garbage
collection pauses and other factors that can delay message sending and processing. Larger
values will result in more stable masters but also will result in longer waits before a failover in
case of master failure. This value should not be set to less than twice the
ha.heartbeat_interval value otherwise there is a high risk of frequent master switches and
possibly branched data occurrence.
Valid values
ha.heartbeat_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
40000
Table 179. ha.host.coordination
Description
Host and port to bind the cluster management communication.
Valid values
ha.host.coordination is a hostname and port
Default value
0.0.0.0:5001-5099
Table 180. ha.host.data
Description
Hostname and port to bind the HA server.
Valid values
ha.host.data is a hostname and port
Default value
0.0.0.0:6001-6011
Table 181. ha.initial_hosts
Description
A comma-separated list of other members of the cluster to join.
Valid values
ha.initial_hosts is a list separated by "," where items are a hostname and port
Mandatory
The ha.initial_hosts configuration setting is mandatory.
Table 182. ha.internal_role_switch_timeout
Description
Timeout for waiting for internal conditions during state switch, like for transactions to
complete, before switching to master or slave.
Valid values
ha.internal_role_switch_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
10000
Table 183. ha.join_timeout
Description
Timeout for joining a cluster. Defaults to ha.broadcast_timeout.
Valid values
ha.join_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
30000
Table 184. ha.learn_timeout
Description
Timeout for learning values. Defaults to ha.default_timeout.
Valid values
ha.learn_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
Table 185. ha.leave_timeout
Description
Timeout for waiting for cluster leave to finish. Defaults to ha.broadcast_timeout.
Valid values
ha.leave_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
30000
Table 186. ha.max_acceptors
152
Description
Maximum number of servers to involve when agreeing to membership changes. In very large
clusters, the probability of half the cluster failing is low, but protecting against any arbitrary
half failing is expensive. Therefore you may wish to set this parameter to a value less than the
cluster size.
Valid values
ha.max_acceptors is an integer which is minimum 1
Default value
21
Table 187. ha.max_channels_per_slave
Description
Maximum number of connections a slave can have to the master.
Valid values
ha.max_channels_per_slave is an integer which is minimum 1
Default value
20
Table 188. ha.paxos_timeout
Description
Default value for all Paxos timeouts. This setting controls the default value for the
ha.phase1_timeout, ha.phase2_timeout and ha.election_timeout settings. If it is not given a
value it defaults to ha.default_timeout and will implicitly change if ha.default_timeout
changes. This is an advanced parameter which should only be changed if specifically advised
by Neo4j Professional Services.
Valid values
ha.paxos_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
Table 189. ha.phase1_timeout
Description
Timeout for Paxos phase 1. If it is not given a value it defaults to ha.paxos_timeout and will
implicitly change if ha.paxos_timeout changes. This is an advanced parameter which should
only be changed if specifically advised by Neo4j Professional Services.
Valid values
ha.phase1_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
Table 190. ha.phase2_timeout
Description
Timeout for Paxos phase 2. If it is not given a value it defaults to ha.paxos_timeout and will
implicitly change if ha.paxos_timeout changes. This is an advanced parameter which should
only be changed if specifically advised by Neo4j Professional Services.
Valid values
ha.phase2_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
5000
Table 191. ha.pull_batch_size
Description
Size of batches of transactions applied on slaves when pulling from master.
Valid values
ha.pull_batch_size is an integer
Default value
100
Table 192. ha.pull_interval
Description
Interval of pulling updates from master.
Valid values
ha.pull_interval is a duration (valid units are ms, s, m; default unit is s)
Default value
0
Table 193. ha.role_switch_timeout
Description
Timeout for request threads waiting for instance to become master or slave.
Valid values
ha.role_switch_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
120000
153
Table 194. ha.server_id
Description
Id for a cluster instance. Must be unique within the cluster.
Valid values
ha.server_id is an instance id, which has to be a valid integer
Mandatory
The ha.server_id configuration setting is mandatory.
Table 195. ha.slave_lock_timeout
Description
Timeout for taking remote (write) locks on slaves. Defaults to ha.slave_read_timeout.
Valid values
ha.slave_lock_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
20000
Table 196. ha.slave_only
Description
Whether this instance should only participate as slave in cluster. If set to true, it will never be
elected as master.
Valid values
ha.slave_only is a boolean
Default value
false
Table 197. ha.slave_read_timeout
Description
How long a slave will wait for response from master before giving up.
Valid values
ha.slave_read_timeout is a duration (valid units are ms, s, m; default unit is s)
Default value
20000
Table 198. ha.tx_push_factor
Description
The amount of slaves the master will ask to replicate a committed transaction.
Valid values
ha.tx_push_factor is an integer which is minimum 0
Default value
1
Table 199. ha.tx_push_strategy
Description
Push strategy of a transaction to a slave during commit.
Valid values
ha.tx_push_strategy is one of round_robin, fixed_descending, fixed_ascending
Default value
fixed_ascending
Table 200. metrics.bolt.messages.enabled
Description
Enable reporting metrics about Bolt Protocol message processing.
Valid values
metrics.bolt.messages.enabled is a boolean
Default value
false
Table 201. metrics.csv.enabled
Description
Set to true to enable exporting metrics to CSV files.
Valid values
metrics.csv.enabled is a boolean
Default value
false
Table 202. metrics.csv.interval
Description
The reporting interval for the CSV files. That is, how often new rows with numbers are
appended to the CSV files.
Valid values
metrics.csv.interval is a duration (valid units are ms, s, m; default unit is s)
154
Default value
3000
Table 203. metrics.cypher.replanning.enabled
Description
Enable reporting metrics about number of occurred replanning events.
Valid values
metrics.cypher.replanning.enabled is a boolean
Default value
false
Table 204. metrics.enabled
Description
The default enablement value for all the supported metrics. Set this to false to turn off all
metrics by default. The individual settings can then be used to selectively re-enable specific
metrics.
Valid values
metrics.enabled is a boolean
Default value
false
Table 205. metrics.graphite.enabled
Description
Set to true to enable exporting metrics to Graphite.
Valid values
metrics.graphite.enabled is a boolean
Default value
false
Table 206. metrics.graphite.interval
Description
The reporting interval for Graphite. That is, how often to send updated metrics to Graphite.
Valid values
metrics.graphite.interval is a duration (valid units are ms, s, m; default unit is s)
Default value
3000
Table 207. metrics.graphite.server
Description
The hostname or IP address of the Graphite server.
Valid values
metrics.graphite.server is a hostname and port
Default value
:2003
Table 208. metrics.jvm.buffers.enabled
Description
Enable reporting metrics about the buffer pools.
Valid values
metrics.jvm.buffers.enabled is a boolean
Default value
false
Table 209. metrics.jvm.gc.enabled
Description
Enable reporting metrics about the duration of garbage collections.
Valid values
metrics.jvm.gc.enabled is a boolean
Default value
false
Table 210. metrics.jvm.memory.enabled
Description
Enable reporting metrics about the memory usage.
Valid values
metrics.jvm.memory.enabled is a boolean
Default value
false
Table 211. metrics.jvm.threads.enabled
155
Description
Enable reporting metrics about the current number of threads running.
Valid values
metrics.jvm.threads.enabled is a boolean
Default value
false
Table 212. metrics.neo4j.causal_clustering.enabled
Description
Enable reporting metrics about Causal Clustering mode.
Valid values
metrics.neo4j.causal_clustering.enabled is a boolean
Default value
false
Table 213. metrics.neo4j.checkpointing.enabled
Description
Enable reporting metrics about Neo4j check pointing; when it occurs and how much time it
takes to complete.
Valid values
metrics.neo4j.checkpointing.enabled is a boolean
Default value
false
Table 214. metrics.neo4j.cluster.enabled
Description
Enable reporting metrics about HA cluster info.
Valid values
metrics.neo4j.cluster.enabled is a boolean
Default value
false
Table 215. metrics.neo4j.counts.enabled
Description
Enable reporting metrics about approximately how many entities are in the database; nodes,
relationships, properties, etc.
Valid values
metrics.neo4j.counts.enabled is a boolean
Default value
false
Table 216. metrics.neo4j.enabled
Description
The default enablement value for all Neo4j specific support metrics. Set this to false to turn off
all Neo4j specific metrics by default. The individual metrics.neo4j.* metrics can then be turned
on selectively.
Valid values
metrics.neo4j.enabled is a boolean
Default value
false
Table 217. metrics.neo4j.logrotation.enabled
Description
Enable reporting metrics about the Neo4j log rotation; when it occurs and how much time it
takes to complete.
Valid values
metrics.neo4j.logrotation.enabled is a boolean
Default value
false
Table 218. metrics.neo4j.network.enabled
Description
Enable reporting metrics about the network usage.
Valid values
metrics.neo4j.network.enabled is a boolean
Default value
false
Table 219. metrics.neo4j.pagecache.enabled
Description
Enable reporting metrics about the Neo4j page cache; page faults, evictions, flushes,
exceptions, etc.
156
Valid values
metrics.neo4j.pagecache.enabled is a boolean
Default value
false
Table 220. metrics.neo4j.server.enabled
Description
Enable reporting metrics about Server threading info.
Valid values
metrics.neo4j.server.enabled is a boolean
Default value
false
Table 221. metrics.neo4j.tx.enabled
Description
Enable reporting metrics about transactions; number of transactions started, committed, etc.
Valid values
metrics.neo4j.tx.enabled is a boolean
Default value
false
Table 222. metrics.prefix
Description
A common prefix for the reported metrics field names. By default, this is either be 'neo4j', or a
computed value based on the cluster and instance names, when running in an HA
configuration.
Valid values
metrics.prefix is a string
Default value
neo4j
Table 223. tools.consistency_checker.check_graph
Description
Perform checks between nodes, relationships, properties, types and tokens.
Valid values
tools.consistency_checker.check_graph is a boolean
Default value
true
Table 224. tools.consistency_checker.check_indexes
Description
Perform checks on indexes. Checking indexes is more expensive than checking the native
stores, so it may be useful to turn off this check for very large databases.
Valid values
tools.consistency_checker.check_indexes is a boolean
Default value
true
Table 225. tools.consistency_checker.check_label_scan_store
Description
Perform checks on the label scan store. Checking this store is more expensive than checking
the native stores, so it may be useful to turn off this check for very large databases.
Valid values
tools.consistency_checker.check_label_scan_store is a boolean
Default value
true
Table 226. tools.consistency_checker.check_property_owners
Description
Perform optional additional checking on property ownership. This can detect a theoretical
inconsistency where a property could be owned by multiple entities. However, the check is very
expensive in time and memory, so it is skipped by default.
Valid values
tools.consistency_checker.check_property_owners is a boolean
Default value
false
A.2. Built-in procedures
This section shows the built-in procedures that are bundled with Neo4j.
157
A.2.1. General-purpose procedures
Neo4j comes bundled with a number of built-in procedures. These can be used to:
• Inspect schema.
• Inspect meta data.
• Explore procedures and components.
• Monitor management data.
A subset of these are listed in the table below. Running CALL dbms.procedures() will display the full list
of all the procedures.
Procedure name
Command to invoke procedure
What it does
ListLabels
CALL db.labels()
List all labels in the database.
ListRelationshipTypes
CALL db.relationshipTypes()
List all relationship types in the
database.
ListPropertyKeys
CALL db.propertyKeys()
List all property keys in the database.
ListIndexes
CALL db.indexes()
List all indexes in the database.
AwaitIndex
CALL db.awaitIndex(label, property,
timeout)
Wait for the specified index to come
online.
ListConstraints
CALL db.constraints()
List all constraints in the database.
ListProcedures
CALL dbms.procedures()
List all procedures in the DBMS.
ListFunctions
CALL dbms.functions()
List all user functions in the DBMS.
ListComponents
CALL dbms.components()
List DBMS components and their
versions.
QueryJmx
CALL dbms.queryJmx(query)
Query JMX management data by
domain and name. For instance,
"org.neo4j:*".
A.2.2. Procedures for native user and role management
We provide below a list of all available procedures for native user and role management. These
procedures are available in Neo4j Enterprise Edition.
Name
Description
dbms.security.activateUser
Activate a user
dbms.security.addRoleToUser
Assign a role to a user
dbms.security.changeUserPassword
Change a user’s password
dbms.security.changePassword
Change your own password
dbms.security.createRole
Create a custom role
dbms.security.createUser
Add a user
dbms.security.deleteRole
Delete a custom role
dbms.security.deleteUser
Delete a user
dbms.security.listRoles
List all roles
dbms.security.listRolesForUser
List all roles for a user
dbms.security.listUsers
List all users
158
Name
Description
dbms.security.listUsersForRole
List all users for a role
dbms.security.removeRoleFromUser
Remove a role from a user
dbms.security.suspendUser
Suspend a user
A.3. User management for Community Edition
This section describes user and password management for Neo4j Community Edition.
In Neo4j, native user and role management are managed by using built-in procedures through
Cypher.
This chapter gives a list of all the security procedures for user management along with some simple
examples.
Use the Neo4j Browser or the Neo4j Cypher Shell to run the examples provided.
Unless stated otherwise, all arguments to the procedures described in this section must be supplied.
Name
Description
dbms.security.listUsers
List all users
dbms.security.changePassword
Change the current user’s password
dbms.security.showCurrentUser
Show details for the current user
dbms.security.createUser
Add a user
dbms.security.deleteUser
Delete a user
A.3.1. List all users
The current user is able to view the details of every user in the system.
Syntax:
CALL dbms.security.listUsers()
Returns:
Name
Type
Description
username
String
This is the user’s username.
flags
List<String>
This is flag indicating whether the user
needs to change their password.
159
Example 42. List all users
The following example shows, for each user in the system, the username and whether the user
needs to change their password.
CALL dbms.security.listUsers()
+-----------------------------------------+
| username | flags
|
+-----------------------------------------+
| "neo4j" | []
|
| "anne"
| ["password_change_required"] |
| "bill"
| []
|
+-----------------------------------------+
3 rows
A.3.2. Change the current user’s password
The current user is able to change their own password at any time.
Syntax:
CALL dbms.security.changePassword(password)
Arguments:
Name
Type
Description
password
String
This is the new password for the
current user.
Exceptions:
The password is the empty string.
The password is the same as the current user’s previous password.
Example 43. Change the current user’s password
The following example changes the password of the current user to 'h6u4%kr'.
CALL dbms.security.changePassword('h6u4%kr')
A.3.3. Show details for the current user
The current user is able to view whether or not they need to change their password.
Syntax:
CALL dbms.security.showCurrentUser()
Returns:
160
Name
Type
Description
username
String
This is the user’s username.
flags
List<String>
This is a flag indicating whether the
user needs change their password.
Example 44. Show details for the current user
The following example shows that the current user — with the username 'johnsmith' — does not
need to change his password.
CALL dbms.security.showCurrentUser()
+---------------------+
| username
| flags |
+---------------------+
| "johnsmith" | []
|
+---------------------+
1 row
A.3.4. Add a user
The current user is able to add a user to the system.
Syntax:
CALL dbms.security.createUser(username, password, requirePasswordChange)
Arguments:
Name
Type
Description
username
String
This is the user’s username.
password
String
This is the user’s password.
requirePasswordChange
Boolean
This is optional, with a default of true.
If this is true, (i) the user will be forced
to change their password when they
log in for the first time, and (ii) until
the user has changed their password,
they will be forbidden from
performing any other operation.
Exceptions:
The username contains characters other than alphanumeric characters and the ‘_’ character.
The username is already in use within the system.
The password is the empty string.
Example 45. Add a user
The following example creates a user with the username 'johnsmith' and password 'h6u4%kr'.
When the user 'johnsmith' logs in for the first time, he will be required to change his password.
CALL dbms.security.createUser('johnsmith', 'h6u4%kr', true)
161
A.3.5. Delete a user
The current user is able to delete permanently a user from the system.
Syntax:
CALL dbms.security.deleteUser(username)
Arguments:
Name
Type
Description
username
String
This is the username of the user to be
deleted.
Exceptions:
The username does not exist in the system.
The username matches that of the current user (i.e. deleting the current user is not permitted).
Considerations:
• Deleting a user will terminate with immediate effect all of the user’s sessions and roll back any
running transactions.
• As it is not possible for the current user to delete themselves, there will always be at least one user
in the system.
Example 46. Delete a user
The following example deletes a user with the username 'janebrown'.
CALL dbms.security.deleteUser('janebrown')
162
Appendix B: Tutorial
This chapter contains examples and tutorials that further describes the use of various areas
of Neo4j.
The following step-by-step tutorials cover common operational tasks or otherwise exemplifies working
with Neo4j.
• Set up a local Causal Cluster
• Set up a Highly Available cluster
• Set up a local Highly Available cluster
• Use the Import tool to import data into Neo4j
• Manage users and roles
B.1. Set up a local Causal Cluster
This section walks through the basics of setting up a Neo4j Causal Cluster. The result is a
local cluster of six instances: three Cores and three Read replicas.
In this section we will learn how to deploy a Causal Cluster locally, on a single machine. This is useful
as a learning exercise and to get started quickly developing an application against a Neo4j Causal
Cluster. A cluster on a single machine has no fault tolerance and is not suitable for production use.
We will begin by configuring and starting a cluster of three Core instances. This is the minimal
deployment of a Causal Cluster. The Core instances are responsible for keeping the data safe. For
fault tolerance the minimal cluster has three Core members and can tolerate the failure of at most
one of those members.
After the Core of the cluster is operational we will add three Read replicas. The Read replicas are
responsible for scaling the capacity of the cluster.
The core of the Causal Cluster remains stable over time. The roles within the core will change as
needed but the core itself is long-lived and stable. At the edge of the cluster, the Read replicas are
cheap and disposable. They can be added as needed to increase the operational capacity of the
cluster as a whole.
B.1.1. Download and configure
Some of the configuration for the instances in the cluster will be identical. A convenient way to go
about the configuration is therefore:
1. Create a local working directory.
2. Download a copy of Neo4j Enterprise Edition from the Neo4j download site
(http://neo4j.com/download/).
3. Unpack Neo4j in the working directory.
4. Make a copy of the neo4j-enterprise-3.1.2 directory and name it core-01/ or similar. Keep the
original directory to use when setting up the Read replicas later. The core-01/ directory will contain
the first Core instance.
5. Complete the configuration (see below) for the first Core instance. Then make two copies of the
core-01/ directory and name them core-02/ and core-03/.
163
6. Proceed with the configuration of the two copied instances. Those changes that are common to all
three Core instances are now already in place for Cores number two and three.
In this example we are running all instances in the cluster on a single machine. Many of the default
configuration settings work well out of the box in a production deployment, with multiple machines.
Some of these we have to change when deploying multiple instances on a single machine, so that
instances do not try to use the same network ports. We call out the settings that are specific to this
scenario as we go along.
B.1.2. Configure the Core instances
All configuration that we will do takes place in the Neo4j configuration file, conf/neo4j.conf. If you used
a different package than in the download instructions above, see File locations to locate the
configuration file. Look in the configuration file for a section labeled "Causal Clustering Configuration".
Minimum configuration
The minimum configuration for a Core instance requires setting the following:
dbms.mode
The operating mode of this instance, either CORE or READ_REPLICA. Uncomment this setting and give
it the value CORE.
causal_clustering.expected_core_cluster_size
The number of Core instances in the cluster. Uncomment this setting and give it the value 3.
causal_clustering.initial_discovery_members
The network addresses of Core cluster members to be used to discover the cluster when this
instance joins. Uncomment this setting and give it the value
localhost:5000,localhost:5001,localhost:5002.
Additional configuration
In addition to the above settings, because the instances are all running on the same machine, we
need to configure the following:
In the section "Causal Clustering Configuration", the following settings need to be changed:
causal_clustering.discovery_listen_address
The port used for discovery between machines. Uncomment this setting and the value :5000. On
the other two instances, give it the values :5001 and :5002 respectively.
causal_clustering.transaction_listen_address
The internal transaction communication address. Uncomment this setting and give it the value
:6000. On the other two instances, give it the values :6001 and :6002 respectively.
causal_clustering.raft_listen_address
The internal consensus mechanism address. Uncomment this setting and give it the value :7000 On
the other instances give it the values :7001 and :7002 respectively.
In the section "Network connector configuration", the following settings need to be changed:
dbms.connector.bolt.listen_address
The bolt connector address. Uncomment this line and use a unique port for each installation. For
example, :7687, :7688, and :7689 on each instance, respectively.
164
dbms.connector.http.listen_address
The HTTP connector address. Uncomment this line and use a unique port for each installation. For
example, :7474, :7475, and :7476 on each instance, respectively.
dbms.connector.https.listen_address
The HTTPS connector address. Use a unique port for each installation. For example, :6474, :6475,
and :6476 on each instance, respectively.
B.1.3. Start the Neo4j servers
Start each Neo4j instance as usual. The startup order does not matter.
core-01$ ./bin/neo4j start
core-02$ ./bin/neo4j start
core-03$ ./bin/neo4j start
Startup Time

If you want to follow along with the startup of a server you can follow the messages
in logs/neo4j.log. On a Unix system issue the command tail -f logs/neo4j.log. On
Windows Server run Get-Content .\logs\neo4j.log -Tail 10 -Wait. While an
instance is joining the cluster, the server may appear unavailable. In the case where
an instance is joining a cluster with lots of data, it may take a number of minutes for
the new instance to download the data from the cluster and become available.
B.1.4. Check the status of the cluster
Now the minimal cluster of three Core servers is operational and ready to serve requests. Connect to
either of the three instances to check the cluster status. Point your web browser to
http://localhost:7474. Authenticate with the default neo4j/neo4j and set a new password. These
credentials are not shared between cluster members. A new password must be set on each instance
when connecting for the first time. For production deployment we advise integrating the Neo4j cluster
with your directory service. See Integration with LDAP for more details.
Once you have authenticated you can check the status of the cluster by running the query: CALL
dbms.cluster.overview(). The output will look similar to the following.
Table 227. Cluster overview with dbms.cluster.overview()
id
addresses
role
08eb9305-53b9-4394-92370f0d63bb05d5
[bolt://localhost:7687,
http://localhost:7474,
https://localhost:6474]
LEADER
cb0c729d-233c-452f-8f06f2553e08f149
[bolt://localhost:7688,
http://localhost:7475,
https://localhost:6475]
FOLLOWER
ded9eed2-dd3a-4574-bc086a569f91ec5c
[bolt://localhost:7689,
http://localhost:7476,
https://localhost:6476]
FOLLOWER
The three Core instances in the cluster are operational.
165
B.1.5. Test the cluster
Now you can run queries to create nodes and relationships, and see that the data gets replicated in
the cluster.
When developing an application against a Neo4j Causal Cluster it is not necessary to know about the
roles of the cluster members. The Neo4j Bolt driver creates sessions with access mode READ or WRITE
on request. It is the driver’s responsibility to identify the best cluster member to service the session
according to need.
When connecting directly with Neo4j Browser, however, we need to be more aware of the roles that
the cluster members have. It is easy to navigate from member to member by running the :sysinfo
command. The sysinfo view contains information about the Neo4j instance. If the instance
participates in a Causal Clustering cluster then this view contains a table: Causal Clustering Cluster
Members. This table contains the same information as provided by the dbms.cluster.overview()
procedure, but here you can also take action on the other members of the cluster. Run the :sysinfo
command and click the Open action on the instance that has the LEADER role. This opens a new
Browser session against the Leader of the cluster.
Authenticate and set a new password, as before. Now you can run a query to create nodes and
relationships.
UNWIND range(0, 100) AS value
MERGE (person1:Person {id: value})
MERGE (person2:Person {id: toInt(100.0 * rand())})
MERGE (person1)-[:FRIENDS]->(person2)
When the query has executed choose an instance with the FOLLOWER role from the sysinfo view. Click
the Open action to connect. Now you can run a query to see that the data has been replicated.
MATCH path = (person:Person)-[:FRIENDS]-(friend)
RETURN path
LIMIT 10
B.1.6. Configure the Read replicas
Setting up the Read replicas is similar to setting up the Cores, but simpler.
1. In your working directory, rename the neo4j-enterprise-3.1.2 directory and name it replica-01/ or
similar. The replica-01/ directory will contain the first Read replica.
2. Complete the configuration (see below) for the first Core instance. Then make two copies of the
replica-01/ directory and name them replica-02/ and replica-03/.
3. Proceed with the configuration of the two copied instances. Those changes that are common to all
three Read replicas are now already in place for replicas number two and three.
Configuring a Read replica is similar to configuring a Core. Read replicas instances do not participate
in quorum decisions, so their configuration is simpler. All that a Read replica needs to know is the
addresses of Core servers which they can bind to in order to discover the cluster. See Discovery
protocol for details. Once it has completed the initial discovery, the Read replica becomes aware of the
currently available Core servers and can choose an appropriate one from which to catch up. See
Catchup protocol for details.
Minimum configuration
The minimum configuration for a Core instance requires setting the following:
166
dbms.mode
The operating mode of this instance, either CORE or READ_REPLICA. Uncomment this setting and give
it the value READ_REPLICA.
causal_clustering.initial_discovery_members
The network addresses of Core cluster members to be used to discover the cluster when this
instance joins. Uncomment this setting and give it the value
localhost:5000,localhost:5001,localhost:5002.
Additional configuration
In addition to the above settings, because the instances are all running on the same machine, we
need to configure the following:
In the section "Network connector configuration", the following settings need to be changed:
dbms.connector.bolt.listen_address
The bolt connector address. Uncomment this line and use a unique port for each installation. For
example, :7690, :7691, and :7692 on each instance, respectively.
dbms.connector.http.listen_address
The HTTP connector address. Uncomment this line and use a unique port for each installation. For
example, :7477, :7478, and :7479 on each instance, respectively.
dbms.connector.https.listen_address
The HTTPS connector address. Use a unique port for each installation. For example, :6477, :6478,
and :6479 on each instance, respectively.
B.1.7. Test the cluster with Read replicas
Connect to any of the instances and run CALL dbms.cluster.overview() to see the new overview of the
cluster. With Read replicas added the overview will look similar to:
Table 228. Cluster overview with dbms.cluster.overview()
id
addresses
role
08eb9305-53b9-4394-92370f0d63bb05d5
[bolt://localhost:7687,
http://localhost:7474,
https://localhost:6474]
LEADER
cb0c729d-233c-452f-8f06f2553e08f149
[bolt://localhost:7688,
http://localhost:7475,
https://localhost:6475]
FOLLOWER
ded9eed2-dd3a-4574-bc086a569f91ec5c
[bolt://localhost:7689,
http://localhost:7476,
https://localhost:6476]
FOLLOWER
00000000-0000-0000-0000000000000000
[bolt://localhost:7690,
http://localhost:7477,
https://localhost:6477]
READ_REPLICA
00000000-0000-0000-0000000000000000
[bolt://localhost:7691,
http://localhost:7478,
https://localhost:6478]
READ_REPLICA
00000000-0000-0000-0000000000000000
[bolt://localhost:7692,
http://localhost:7479,
https://localhost:6479]
READ_REPLICA
To test that the Read replicas have successfully caught up with the cluster use :sysinfo and click the
Open action to connect to one of the read replicas. Issue the same query as before:
167
MATCH path = (person:Person)-[:FRIENDS]-(friend)
RETURN path
LIMIT 10
B.2. Set up a Highly Available cluster
This guide will give step-by-step instructions for setting up a basic cluster of three separate
machines. For a description of the clustering architecture and related design considerations,
refer to Introduction.
B.2.1. Download and configure
1. Download Neo4j Enterprise Edition from the Neo4j download site (http://neo4j.com/download/), and
unpack on three separate machines.
2. Configure the HA related settings for each installation as outlined below. Note that all three
installations have the same configuration except for the ha.server_id property.
168
Example 47. Configuration of neo4j.conf for each of the three HA servers
Neo4j instance #1 on the server named neo4j-01.local
conf/neo4j.conf
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 1
# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA
# HTTP Connector
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474
Neo4j instance #2 on the server named neo4j-02.local
conf/neo4j.conf
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 2
# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA
# HTTP Connector
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474
Neo4j instance #3 on the server named neo4j-03.local
conf/neo4j.conf
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id = 3
# List of other known instances in this cluster
ha.initial_hosts = neo4j-01.local:5001,neo4j-02.local:5001,neo4j-03.local:5001
# Alternatively, use IP addresses:
#ha.initial_hosts = 192.168.0.20:5001,192.168.0.21:5001,192.168.0.22:5001
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA
# HTTP Connector
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474
169
B.2.2. Start the Neo4j Servers
Start the Neo4j servers as usual. Note that the startup order does not matter.
Example 48. Start the three HA servers
neo4j-01$ ./bin/neo4j start
neo4j-02$ ./bin/neo4j start
neo4j-03$ ./bin/neo4j start
Startup Time

When running in HA mode, the startup script returns immediately instead of
waiting for the server to become available. The database will be unavailable until all
members listed in ha.initial_hosts are online and communicating with each other.
In the example above this happens when you have started all three instances. To
keep track of the startup state you can follow the messages in neo4j.log — the path
is printed before the startup script returns.
Now, you should be able to access the three servers and check their HA status. Open the locations
below in a web browser and issue the following command in the editor after having set a password
for the database: :play sysinfo
• http://neo4j-01.local:7474/
• http://neo4j-02.local:7474/
• http://neo4j-03.local:7474/

You can replace database #3 with an 'arbiter' instance, see Arbiter instances.
That is it! You now have a Neo4j cluster of three instances running. You can start by making a change
on any instance and those changes will be propagated between them. For more cluster related
configuration options take a look at Setup and configuration.
B.3. Set up a local HA cluster
If you want to start a cluster similar to the one described above, but for development and testing
purposes, it is convenient to run all Neo4j instances on the same machine. This is easy to achieve,
although it requires some additional configuration as the defaults will conflict with each other.
Furthermore, the default dbms.memory.pagecache.size assumes that Neo4j has the machine to itself. If
we in this example assume that the machine has 4 gigabytes of memory, and that each JVM consumes
500 megabytes of memory, then we can allocate 500 megabytes of memory to the page cache of each
server.
B.3.1. Download and configure
1. Download Neo4j Enterprise Edition from the Neo4j download site (http://neo4j.com/download/), and
unpack into three separate directories on your test machine.
2. Configure the HA related settings for each installation as outlined below.
Example 49. Configuration of neo4j.conf for each of the three local HA servers
170
Neo4j instance #1 is located in ~/neo4j-01
conf/neo4j.conf
# Reduce the default page cache memory allocation
dbms.memory.pagecache.size=500m
# Port to listen to for incoming backup requests.
dbms.backup.address = 127.0.0.1:6366
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id=1
# List of other known instances in this cluster
ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
# IP and port for this instance to bind to for communicating cluster information
# with the other neo4j instances in the cluster.
ha.host.coordination = 127.0.0.1:5001
# IP and port for this instance to bind to for communicating data with the
# other neo4j instances in the cluster.
ha.host.data = 127.0.0.1:6363
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA
# HTTP Connector
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7474
# Bolt connector
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=:7687
Neo4j instance #2 is located in ~/neo4j-02
171
conf/neo4j.conf
# Reduce the default page cache memory allocation
dbms.memory.pagecache.size=500m
# Port to listen to for incoming backup requests.
dbms.backup.address = 127.0.0.1:6367
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id=2
# List of other known instances in this cluster
ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
# IP and port for this instance to bind to for communicating cluster information
# with the other neo4j instances in the cluster.
ha.host.coordination = 127.0.0.1:5002
# IP and port for this instance to bind to for communicating data with the
# other neo4j instances in the cluster.
ha.host.data = 127.0.0.1:6364
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA
# HTTP Connector
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7475
# Bolt connector
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=:7688
Neo4j instance #3 is located in ~/neo4j-03
conf/neo4j.conf
# Reduce the default page cache memory allocation
dbms.memory.pagecache.size=500m
# Port to listen to for incoming backup requests.
dbms.backup.address = 127.0.0.1:6368
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id=3
# List of other known instances in this cluster
ha.initial_hosts = 127.0.0.1:5001,127.0.0.1:5002,127.0.0.1:5003
# IP and port for this instance to bind to for communicating cluster information
# with the other neo4j instances in the cluster.
ha.host.coordination = 127.0.0.1:5003
# IP and port for this instance to bind to for communicating data with the
# other neo4j instances in the cluster.
ha.host.data = 127.0.0.1:6365
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode=HA
# HTTP Connector
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=:7476
# Bolt connector
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=:7689
172
Start the Neo4j Servers
Start the Neo4j servers as usual. Note that the startup order does not matter.
Example 50. Start the three local HA servers
localhost:~/neo4j-01$ ./bin/neo4j start
localhost:~/neo4j-02$ ./bin/neo4j start
localhost:~/neo4j-03$ ./bin/neo4j start
Now, you should be able to access the three servers and check their HA status. Open the locations
below in a web browser and issue the following command in the editor after having set a password
for the database: :play sysinfo
• http://127.0.0.1:7474/
• http://127.0.0.1:7475/
• http://127.0.0.1:7476/
B.4. Use the Import tool
This tutorial provides detailed examples of using the Neo4j import tool.
This tutorial walks us through a series of examples to illustrate the capabilities of the Import tool.
When using CSV files for loading a database, each node must have a unique identifier, a node
identifier, in order to be able to create relationships between nodes in the same process. Relationships
are created by connecting the node identifiers. In the examples below, the node identifiers are stored
as properties on the nodes. Node identifiers may be of interest later for cross-reference to other
systems, traceability etc., but they are not mandatory. If you do not want the identifiers to persist after
a completed import, then do not specify a property name in the :ID field.
It is possible to import only nodes using the import tool For doing so simply omit a relationships file
when calling neo4j-import. Any relationships between the imported nodes will have to be created later
by another method, since the import tool works for initial graph population only.
For this tutorial we will use a data set containing movies, actors and roles. If running the examples,
exchange path_to_target_directory with the path to the database file directory. In a default
installation, the path_to_target_directory is: <neo4j-home>/data/databases/graph.db. Note that if you
wish to run one example after another you have to remove the database files in between.
B.4.1. Basic example
First we will look at the movies. Each movie has an id, which is used for referring to it from other data
sources Moreover, each movie has a title and a year. Along with these properties we also add the
node labels Movie and Sequel.
173
movies.csv
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
Next up are the actors. They have an id - in this case a shorthand of their name - and a name. All the
actors have the node label Actor.
actors.csv
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
Finally we have the roles that an actor plays in a movie, which will be represented by relationships in
the database. In order to create a relationship between nodes we use the ids defined in actors.csv
and movies.csv for the START_ID and END_ID fields. We also need to provide a relationship type (in this
case ACTED_IN) for the :TYPE field.
roles.csv
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies.csv --nodes actors.csv
--relationships roles.csv
Now start up a database from the target directory:
neo4j_home$ ./bin/neo4j start
B.4.2. Customizing configuration options
We can customize the configuration options that the import tool uses (see Options) if our data does
not fit the default format. The following CSV files are delimited by ;, use | as the array delimiter and
use ' for quotes.
movies2.csv
movieId:ID;title;year:int;:LABEL
tt0133093;'The Matrix';1999;Movie
tt0234215;'The Matrix Reloaded';2003;Movie|Sequel
tt0242653;'The Matrix Revolutions';2003;Movie|Sequel
174
actors2.csv
personId:ID;name;:LABEL
keanu;'Keanu Reeves';Actor
laurence;'Laurence Fishburne';Actor
carrieanne;'Carrie-Anne Moss';Actor
roles2.csv
:START_ID;role;:END_ID;:TYPE
keanu;'Neo';tt0133093;ACTED_IN
keanu;'Neo';tt0234215;ACTED_IN
keanu;'Neo';tt0242653;ACTED_IN
laurence;'Morpheus';tt0133093;ACTED_IN
laurence;'Morpheus';tt0234215;ACTED_IN
laurence;'Morpheus';tt0242653;ACTED_IN
carrieanne;'Trinity';tt0133093;ACTED_IN
carrieanne;'Trinity';tt0234215;ACTED_IN
carrieanne;'Trinity';tt0242653;ACTED_IN
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies2.csv --nodes actors2.csv
--relationships roles2.csv --delimiter ";" --array-delimiter "|" --quote "'"
B.4.3. Using separate header files
When dealing with very large CSV files it is more convenient to have the header in a separate file. This
makes it easier to edit the header as you avoid having to open a huge data file just to change it.
The import tool can also process single file compressed archives, for example: . --nodes nodes.csv.gz .
--relationships rels.zip
We will use the same data as in the previous example but put the headers in separate files.
movies3-header.csv
movieId:ID,title,year:int,:LABEL
movies3.csv
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
actors3-header.csv
personId:ID,name,:LABEL
actors3.csv
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
roles3-header.csv
:START_ID,role,:END_ID,:TYPE
175
roles3.csv
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
The call to neo4j-import would look as follows. Note how the file groups are enclosed in quotation
marks in the command.
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes "movies3-header.csv,movies3.csv"
--nodes "actors3-header.csv,actors3.csv" --relationships "roles3-header.csv,roles3.csv"
B.4.4. Multiple input files
In addition to using a separate header file you can also provide multiple nodes or relationships files.
This may be useful for example for processing the output from a Hadoop pipeline. Files within such an
input group can be specified with multiple match strings, delimited by ,, where each match string can
be either: the exact file name or a regular expression matching one or more files. Multiple matching
files will be sorted according to their characters and their natural number sort order for file names
containing numbers.
movies4-header.csv
movieId:ID,title,year:int,:LABEL
movies4-part1.csv
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
movies4-part2.csv
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
actors4-header.csv
personId:ID,name,:LABEL
actors4-part1.csv
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
actors4-part2.csv
carrieanne,"Carrie-Anne Moss",Actor
roles4-header.csv
:START_ID,role,:END_ID,:TYPE
176
roles4-part1.csv
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
roles4-part2.csv
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes "movies4-header.csv,movies4part1.csv,movies4-part2.csv" --nodes "actors4-header.csv,actors4-part1.csv,actors4-part2.csv"
--relationships "roles4-header.csv,roles4-part1.csv,roles4-part2.csv"
B.4.5. Types and labels
Using the same label for every node
If you want to use the same node label(s) for every node in your nodes file you can do this by
specifying the appropriate value as an option to neo4j-import. There is then no need to specify the
:LABEL field in the node file if you pass it as a command line option. If you do then both the label
provided in the file and the one provided on the command line will be added to the node.
In this example we put the label Movie on every node specified in movies5a.csv, and we put the labels
Movie and Sequel on the nodes specified in sequels5a.csv.
movies5a.csv
movieId:ID,title,year:int
tt0133093,"The Matrix",1999
sequels5a.csv
movieId:ID,title,year:int
tt0234215,"The Matrix Reloaded",2003
tt0242653,"The Matrix Revolutions",2003
actors5a.csv
personId:ID,name
keanu,"Keanu Reeves"
laurence,"Laurence Fishburne"
carrieanne,"Carrie-Anne Moss"
177
roles5a.csv
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes:Movie movies5a.csv
--nodes:Movie:Sequel sequels5a.csv --nodes:Actor actors5a.csv --relationships roles5a.csv
Using the same relationship type for every relationship
If you want to use the same relationship type for every relationship in your relationships file this can
be done by specifying the appropriate value as an option to neo4j-import. If you provide a relationship
type both on the command line and in the relationships file, the one in the file will be applied. In this
example we put the relationship type ACTED_IN on every relationship specified in roles5b.csv:
movies5b.csv
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
actors5b.csv
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
roles5b.csv
:START_ID,role,:END_ID
keanu,"Neo",tt0133093
keanu,"Neo",tt0234215
keanu,"Neo",tt0242653
laurence,"Morpheus",tt0133093
laurence,"Morpheus",tt0234215
laurence,"Morpheus",tt0242653
carrieanne,"Trinity",tt0133093
carrieanne,"Trinity",tt0234215
carrieanne,"Trinity",tt0242653
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies5b.csv --nodes actors5b.csv
--relationships:ACTED_IN roles5b.csv
B.4.6. Property types
The type for properties specified in nodes and relationships files is defined in the header row. (see CSV
file header format)
178
The following example creates a small graph containing one actor and one movie connected by an
ACTED_IN relationship. There is a roles property on the relationship which contains an array of the
characters played by the actor in a movie.
movies6.csv
movieId:ID,title,year:int,:LABEL
tt0099892,"Joe Versus the Volcano",1990,Movie
actors6.csv
personId:ID,name,:LABEL
meg,"Meg Ryan",Actor
roles6.csv
:START_ID,roles:string[],:END_ID,:TYPE
meg,"DeDe;Angelica Graynamore;Patricia Graynamore",tt0099892,ACTED_IN
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies6.csv --nodes actors6.csv
--relationships roles6.csv
B.4.7. ID handling
Each node processed by neo4j-import must provide a unique id. This id is used to find the correct
nodes when creating relationships.
Working with sequential or auto incrementing identifiers
The import tool makes the assumption that identifiers are unique across node files. This may not be
the case for data sets which use sequential, auto incremented or otherwise colliding identifiers. Those
data sets can define id spaces where identifiers are unique within their respective id space.
For example if movies and people both use sequential identifiers then we would define Movie and
Actor id spaces.
movies7.csv
movieId:ID(Movie-ID),title,year:int,:LABEL
1,"The Matrix",1999,Movie
2,"The Matrix Reloaded",2003,Movie;Sequel
3,"The Matrix Revolutions",2003,Movie;Sequel
actors7.csv
personId:ID(Actor-ID),name,:LABEL
1,"Keanu Reeves",Actor
2,"Laurence Fishburne",Actor
3,"Carrie-Anne Moss",Actor
We also need to reference the appropriate id space in our relationships file so it knows which nodes to
connect together:
179
roles7.csv
:START_ID(Actor-ID),role,:END_ID(Movie-ID)
1,"Neo",1
1,"Neo",2
1,"Neo",3
2,"Morpheus",1
2,"Morpheus",2
2,"Morpheus",3
3,"Trinity",1
3,"Trinity",2
3,"Trinity",3
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies7.csv --nodes actors7.csv
--relationships:ACTED_IN roles7.csv
B.4.8. Bad input data
The import tool has a threshold of how many bad entities (nodes or relationships) to tolerate and skip
before failing the import. By default 1000 bad entities are tolerated. A bad tolerance of 0 will as an
example fail the import on the first bad entity. For more information, see the --bad-tolerance option.
There are different types of bad input, which we will look into.
Relationships referring to missing nodes
Relationships that refer to missing node ids, either for :START_ID or :END_ID are considered bad
relationships. Whether or not such relationships are skipped is controlled with --skip-bad
-relationships flag which can have the values true or false or no value, which means true. Specifying
false means that any bad relationship is considered an error and will fail the import. For more
information, see the --skip-bad-relationships option.
In the following example there is a missing emil node referenced in the roles file.
movies8a.csv
movieId:ID,title,year:int,:LABEL
tt0133093,"The Matrix",1999,Movie
tt0234215,"The Matrix Reloaded",2003,Movie;Sequel
tt0242653,"The Matrix Revolutions",2003,Movie;Sequel
actors8a.csv
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
180
roles8a.csv
:START_ID,role,:END_ID,:TYPE
keanu,"Neo",tt0133093,ACTED_IN
keanu,"Neo",tt0234215,ACTED_IN
keanu,"Neo",tt0242653,ACTED_IN
laurence,"Morpheus",tt0133093,ACTED_IN
laurence,"Morpheus",tt0234215,ACTED_IN
laurence,"Morpheus",tt0242653,ACTED_IN
carrieanne,"Trinity",tt0133093,ACTED_IN
carrieanne,"Trinity",tt0234215,ACTED_IN
carrieanne,"Trinity",tt0242653,ACTED_IN
emil,"Emil",tt0133093,ACTED_IN
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes movies8a.csv --nodes actors8a.csv
--relationships roles8a.csv
Since there was only one bad relationship the import process will complete successfully and a notimported.bad file will be created and populated with the bad relationships.
not-imported.bad
InputRelationship:
source: roles8a.csv:11
properties: [role, Emil]
startNode: emil
endNode: tt0133093
type: ACTED_IN
refering to missing node emil
Multiple nodes with same id within same id space
Nodes that specify :ID which has already been specified within the id space are considered bad
nodes. Whether or not such nodes are skipped is controlled with --skip-duplicate-nodes flag which
can have the values true or false or no value, which means true. Specifying false means that any
duplicate node is considered an error and will fail the import. For more information, see the --skip
-duplicate-nodes option.
In the following example there is a node id that is specified twice within the same id space.
actors8b.csv
personId:ID,name,:LABEL
keanu,"Keanu Reeves",Actor
laurence,"Laurence Fishburne",Actor
carrieanne,"Carrie-Anne Moss",Actor
laurence,"Laurence Harvey",Actor
The call to neo4j-import would look like this:
neo4j_home$ ./bin/neo4j-import --into path_to_target_directory --nodes actors8b.csv --skip-duplicate-nodes
Since there was only one bad node the import process will complete successfully and a notimported.bad file will be created and populated with the bad node.
not-imported.bad
Id 'laurence' is defined more than once in global id space, at least at actors8b.csv:3 and actors8b.csv:5
181
B.5. Scenarios for using role-based access control
In this section, we show different scenarios of using the Neo4j Security features.
In this section, we show two cases of how the Neo4j Security features can be combined to cater for
various real-world scenarios.
Both cases assume the existence of an administrator and a fictitious developer called Jane, who
requires access to the database.
B.5.1. Creating a user and managing roles
Step 1: Administrator creates a user
The administrator creates a user on the system with username 'jane' and password 'abracadabra',
requiring that as soon as Jane logs in for the first time, she is required to change her password
immediately:
CALL dbms.security.createUser('jane', 'abracadabra', true)
Step 2: Administrator assigns the publisher role to the user
The administrator assigns the publisher role to Jane allowing her to both read and write data:
CALL dbms.security.addRoleToUser('publisher', 'jane')
Step 3: User logs in and changes her password
As soon as Jane logs in, she is prompted to change her password.
She changes it to 'R0ckyR0ad88':
CALL dbms.security.changePassword('R0ckyR0ad88')
Step 4: User writes data
Jane executes a query which inserts some data:
CREATE (:Person {name: 'Sam' age: 19})
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1
Properties set: 2
Labels added: 1
182
Step 5: Administrator removes the publisher role from the user
The administrator removes the publisher role from Jane.
CALL dbms.security.removeRoleFromUser('publisher', 'jane')
Step 6: User attempts to read data
Jane tries to execute a read query:
MATCH (p:Person)
RETURN p.name
The query fails, as Jane does not have the role allowing her to read data (in fact, she has no assigned
roles):
Read operations are not allowed for user 'jane' with no roles.
Step 7: Administrator assigns the reader role to the user
The administrator assigns the reader role to Jane:
CALL dbms.security.addRoleToUser('reader', 'jane')
Step 8: User attempts to write data
Jane tries to execute a write query:
CREATE (:Person {name: 'Bob' age: 52})
The query fails, as Jane does not have the role allowing her to write data.
Write operations are not allowed for user 'jane' with roles ['reader'].
Step 9: User attempts to read data
Jane tries to execute a read query:
183
MATCH (p:Person)
RETURN p.name
The query succeeds as she is assigned the reader role:
+-------+
| name |
+-------+
| "Sam" |
+-------+
1 row
B.5.2. Suspending and reactivating a user
This scenario follows on from the one above.
Step 1: Administrator suspends the user
The administrator suspends Jane.
CALL dbms.security.suspendUser('jane')
Step 2: Suspended user tries to log in
Jane tries to log in to the system, and will fail to do so.
Step 3: Administrator activates suspended user
The administrator activates Jane.
CALL dbms.security.activateUser('jane')
Step 4: Activated user logs in
Jane is now able to log in successfully.
License
Creative Commons 3.0
You are free to
Share
copy and redistribute the material in any medium or format
Adapt
remix, transform, and build upon the material
184
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms
Attribution
You must give appropriate credit, provide a link to the license, and indicate if changes were made.
You may do so in any reasonable manner, but not in any way that suggests the licensor endorses
you or your use.
ShareAlike
If you remix, transform, or build upon the material, you must distribute your contributions under
the same license as the original.
No additional restrictions
You may not apply legal terms or technological measures that legally restrict others from doing
anything the license permits.
Notices
You do not have to comply with the license for elements of the material in the public domain or where
your use is permitted by an applicable exception or limitation.
No warranties are given. The license may not give you all of the permissions necessary for your
intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you
use the material.
See http://creativecommons.org/licenses/by-sa/3.0/ for further details. The full license text is available
at http://creativecommons.org/licenses/by-sa/3.0/legalcode.
185
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement