The SiLK Reference Guide (SiLK-3.10.0) - CERT NetSA Security Suite

The SiLK Reference Guide (SiLK-3.10.0) - CERT NetSA Security Suite
The SiLK Reference Guide
(SiLK-3.10.0)
CERT Coordination Center
c 2002–2014 Carnegie Mellon University
License available in Appendix A
The canonical location for this handbook is
http://tools.netsa.cert.org/silk/silk-reference-guide.pdf
December 18, 2014
2
Contents
Introduction
1 SiLK Analysis Tools and
mapsid . . . . . . . . . . .
num2dot . . . . . . . . . .
rwaddrcount . . . . . . .
rwappend . . . . . . . . .
rwbag . . . . . . . . . . . .
rwbagbuild . . . . . . . .
rwbagcat . . . . . . . . . .
rwbagtool . . . . . . . . .
rwcat . . . . . . . . . . . .
rwcombine . . . . . . . . .
rwcompare . . . . . . . .
rwcount . . . . . . . . . .
rwcut . . . . . . . . . . . .
rwdedupe . . . . . . . . .
rwfglob . . . . . . . . . . .
rwfileinfo . . . . . . . . . .
rwfilter . . . . . . . . . . .
rwgeoip2ccmap . . . . . .
rwgroup . . . . . . . . . .
rwidsquery . . . . . . . .
rwip2cc . . . . . . . . . . .
rwipaexport . . . . . . . .
rwipaimport . . . . . . . .
rwipfix2silk . . . . . . . .
rwmatch . . . . . . . . . .
rwnetmask . . . . . . . . .
rwp2yaf2silk . . . . . . .
rwpcut . . . . . . . . . . .
rwpdedupe . . . . . . . .
rwpdu2silk . . . . . . . .
rwpmapbuild . . . . . . .
rwpmapcat . . . . . . . .
rwpmaplookup . . . . . .
rwpmatch . . . . . . . . .
rwptoflow . . . . . . . . .
rwrandomizeip . . . . . .
rwrecgenerator . . . . . .
5
Utilities
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
8
12
14
21
24
30
39
46
55
59
67
69
77
90
96
102
105
131
135
144
148
153
156
158
162
173
178
180
182
184
187
194
202
214
216
223
226
The SiLK Reference Guide
rwresolve . .
rwscan . . . .
rwscanquery
rwset . . . . .
rwsetbuild . .
rwsetcat . . .
rwsetmember
rwsettool . . .
rwsilk2ipfix .
rwsiteinfo . .
rwsort . . . .
rwsplit . . . .
rwstats . . . .
rwswapbytes
rwtotal . . . .
rwtuc . . . . .
rwuniq . . . .
silk config . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
236
239
248
254
259
264
272
274
280
284
290
300
305
325
327
334
343
361
3 SiLK Libraries and Plug-Ins
addrtype . . . . . . . . . . . . .
ccfilter . . . . . . . . . . . . . .
flowrate . . . . . . . . . . . . .
int-ext-fields . . . . . . . . . .
ipafilter . . . . . . . . . . . . . .
packlogic-generic.so . . . . . .
packlogic-twoway.so . . . . . .
pmapfilter . . . . . . . . . . . .
PySiLK . . . . . . . . . . . . .
silk-plugin . . . . . . . . . . . .
silkpython . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
363
364
367
370
373
377
379
383
388
394
427
448
5 SiLK File Formats
477
sensor.conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478
silk.conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
7 SiLK Miscellaneous Information
505
SiLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
8 SiLK Administrator’s
flowcap . . . . . . . . .
rwflowappend . . . . .
rwflowpack . . . . . .
rwguess . . . . . . . .
rwpackchecker . . . .
rwpollexec . . . . . . .
rwreceiver . . . . . . .
rwsender . . . . . . . .
A License
4
Tools
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
515
516
521
528
542
546
552
557
564
571
December 18, 2014
Introduction
The SiLK Reference Guide contains the manual page for each analysis tool, utility, plug-in, file format, and
collection facility in the SiLK Collection and Analysis Suite.
This document is meant for reference only. The SiLK Analysis Handbook provides both a tutorial for learning
about the tools and examples of how they can be used in analyzing flow data. See the SiLK Installation
Handbook for instructions on installing SiLK at your site.
This reference guide is broken into sections like the traditional UNIX manual: end-user analysis tools and
utilities are described in Section 1; the libraries and plug-ins that augment the behavior of some tools are
presented in Section 3; Section 5 contains information about file formats; miscellaneous information is in
Section 7; and commands for the installer and administrator of SiLK appear in Section 8.
5
The SiLK Reference Guide
6
December 18, 2014
1
SiLK Analysis Tools and Utilities
This section provides the manual page for each analysis tool and utility that the users of SiLK may employ
in their day-to-day work.
7
mapsid(1)
The SiLK Reference Guide
mapsid
Map between sensor names and sensor numbers
SYNOPSIS
mapsid [--print-classes] [--print-descriptions]
[--site-config-file=FILENAME]
[{ <sensor-name> | <sensor-number> } ...]
mapsid --help
mapsid --version
DESCRIPTION
As of SiLK 3.0, mapsid is deprecated, and it will be removed in the SiLK 4.0 release. Use rwsiteinfo(1)
instead---the EXAMPLES section shows how to use rwsiteinfo to get output similar to that produced by
mapsid.
mapsid is a utility that maps sensor names to sensor numbers or vice versa depending on the input arguments. Sensors are defined in the silk.conf(5) file.
When no sensor arguments are given to mapsid, the mapping of all sensor numbers to names is printed.
When a numeric argument is given, the number to name mapping is printed for the specified argument.
When a name is given, its numeric id is printed. For convenience when typing in sensor names, case is
ignored.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--print-classes
For each sensor, print the classes for which the sensor collects data. The classes are enclosed in square
brackets, [].
--print-descriptions
For each sensor, print the description of the sensor as defined in the silk.conf file (if any).
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
mapsid searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
8
December 18, 2014
The SiLK Reference Guide
mapsid(1)
EXAMPLES
The following examples demonstrate the use of mapsid. In addition, each example shows how to get similar
output using rwsiteinfo(1).
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Name to number mapping
$ mapsid beta
BETA ->
1
$ rwsiteinfo --fields=sensor,id-sensor --sensors=BETA
Sensor|Sensor-ID|
BETA|
1|
Unlike mapsid, matching of the sensor name is case-sensitive in rwsiteinfo.
Number to name mapping
$ mapsid 3
3 -> DELTA
$ rwsiteinfo --fields=id-sensor,sensor --sensors=3 --delimited=,
Sensor-ID,Sensor
3,DELTA
Print all mappings
$ mapsid
0 -> ALPHA
1 -> BETA
2 -> GAMMA
3 -> DELTA
4 -> EPSLN
5 -> ZETA
....
$ rwsiteinfo --fields=id-sensor,sensor --no-titles
0| ALPHA|
1| BETA|
2| GAMMA|
3| DELTA|
4| EPSLN|
5| ZETA|
...
December 18, 2014
9
mapsid(1)
The SiLK Reference Guide
Print the class
$ mapsid --print-classes 3 ZETA
3 -> DELTA [all]
ZETA ->
5 [all]
$ rwsiteinfo --fields=id-sensor,sensor,class:list --sensors=4,ZETA
Sensor-ID|Sensor|Class:list|
3| DELTA|
all|
5| ZETA|
all|
Print the class and description
$ mapsid --print-classes --print-description 0 1
0 -> ALPHA [all] "Primary gateway"
1 -> BETA
[all] "Secondary gateway"
rwsiteinfo supports using an integer range when specifying sensors.
$ rwsiteinfo --fields=id-sensor,sensor,class:list,describe-sensor \
--sensors=0-1
Sensor-ID|Sensor|Class:list|Sensor-Description|
0| ALPHA|
all|
Primary gateway|
1| BETA|
all| Secondary gateway|
ENVIRONMENT
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, mapsid may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
mapsid may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
10
December 18, 2014
The SiLK Reference Guide
mapsid(1)
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwsiteinfo(1), silk.conf(5), silk(7)
NOTES
As of SiLK 3.0, mapsid is deprecated; use rwsiteinfo(1) instead.
December 18, 2014
11
num2dot(1)
The SiLK Reference Guide
num2dot
Convert an integer IP to dotted-decimal notation
SYNOPSIS
num2dot [--ip-fields=FIELDS] [--delimiter=C]
num2dot --help
num2dot --version
DESCRIPTION
num2dot is a filter to speedup sorting of IP numbers and yet result in both a natural order (i.e., 29.23.1.1
will appear before 192.168.1.1) and readable output (i.e., dotted decimal rather than an integer representation
of the IP number).
It is designed specifically to deal with the output of rwcut(1). Its job is to read stdin and convert specified
fields (default field 1) separated by a delimiter (default ’|’) from an integer number into a dotted decimal IP
address. Up to three IP fields can be specified via the --ip-fields=FIELDS option. The --delimiter option
can be used to specify an alternate delimiter.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--ip-fields=FIELDS
Column number of the input that should be considered IP numbers. Column numbers start from 1. If
not specified, the default is 1.
--delimiter=C
The character that separates the columns of the input. Default is ’|’.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLE
In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
12
December 18, 2014
The SiLK Reference Guide
num2dot(1)
Suppose in addition to the default fields of 1-12 produced by rwcut(1), you want to prefix each row with an
integer form of the destination IP and the start time to make processing by another tool (e.g., a spreadsheet)
easier. However, within the default rwcut output fields of 1-12, you want to see dotted-decimal IP addresses.
You could use the following command:
$ rwfilter ... --pass=stdout
| rwcut --fields=dip,stime,1-12 --ip-format=decimal
--timestamp-format=epoch
| num2dot --ip-field=3,4
\
\
\
In the rwcut invocation, you prepend the fields of interest (dip and stime before the standard fields. The
first six columns produced by rwcut will be dIP, sTime, sIP, dIP, sPort, dPort. The --ip-format switch
causes the first, third, and fourth columns to be printed as integers, but you only want the first column to
have an integer representation. The pipe through num2dot will convert the third and fourth columns to
dotted-decimal IP numbers.
SEE ALSO
rwcut(1), silk(7)
BUGS
num2dot has no support for IPv6 addresses.
December 18, 2014
13
rwaddrcount(1)
The SiLK Reference Guide
rwaddrcount
Count activity by IP address
SYNOPSIS
rwaddrcount {--print-recs | --print-ips | --print-stat}
[--use-dest] [--min-bytes=BYTEMIN] [--max-bytes=BYTEMAX]
[--min-records=RECMIN] [--max-records=RECMAX]
[--min-packets=PACKMIN] [--max-packets=PACKMAX]
[--set-file=PATHNAME] [--sort-ips] [--timestamp-format=FORMAT]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--site-config-file=FILENAME]
[{--legacy-timestamps | --legacy-timestamps=NUM}]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwaddrcount --help
rwaddrcount --version
DESCRIPTION
rwaddrcount reads SiLK Flow records, sums the byte-, packet-, and record-counts on those records by
individual source or destination IP address and maintains the time window during which that IP address
was active. At the end of the count operation, the results per IP address are displayed when the --print-recs
switch is given. rwaddrcount includes facilities for displaying only those IP address whose byte-, packetor flow-counts are between specified minima and maxima.
rwaddrcount reads SiLK Flow records from the files named on the command line or from the standard
input when no file names are specified and --xargs is not present. To read the standard input in addition to
the named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed
as it is read. When the --xargs switch is provided, rwaddrcount will read the names of the files to process
from the named text file, or from the standard input if no file name argument is provided to the switch. The
input to --xargs must contain one file name per line.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
For the application to operate, one of the three --print options must be chosen.
--print-recs
Print one row for each bin that meets the minima/maxima criteria. Each bin contains the IP address,
number of bytes, number of packets, number of flow records, earliest start time, and latest end time.
14
December 18, 2014
The SiLK Reference Guide
rwaddrcount(1)
--print-ips
Print a single column containing the IP addresses for each bin that meets the minima/maxima criteria.
--print-stat
Print a one or two line summary (plus a title line) that summarizes the bins. The first line is a summary
across all bins, and it contains the number of unique IP addresses and the sums of the bytes, packets,
and flow records. The second line is printed only when one or more minima or maxima are specified.
This second line contains the same columns as first, and its values are the sums across those bins that
meet the criteria.
--use-dest
Count by destination IP address in the filter record rather than source IP.
--min-bytes=BYTEMIN
Filtering criterion; for the final output (stats or printing), only include count records where the total
number of bytes exceeds BYTEMIN
--min-packets=PACKMIN
Filtering criterion; for the final output (stats or printing), only include count records where the total
number of packets exceeds PACKMIN
--min-records=RECMIN
Filtering criterion; for the final output (stats or printing), only include count records where the total
number of filter records contributing to that count record exceeds RECMIN.
--max-bytes=BYTEMAX
Filtering criterion; for the final output (stats or printing), only include count records where the total
number of bytes is less than BYTEMAX.
--max-packets=PACKMAX
Filtering criterion; for the final output (stats or printing), only include count records where the total
number of packets is less than PACKMAX.
--max-records=RECMAX
Filtering criterion; for the final output (stats or printing), only include count records which at most
RECMAX filter records contributed to.
--set-file=PATHNAME
Write the IPs into the rwset(1)-style binary IP-set file named PATHNAME. Use rwsetcat(1) to see
the contents of this file.
--timestamp-format=FORMAT
Specify how timestamps will be printed. When this switch is not specified, timestamps are printed in
the default format, and the timezone is UTC unless SiLK was compiled with local timezone support.
FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:
default
Print the timestamps as YYYY /MM /DDThh:mm:ss
iso
Print the timestamps as YYYY -MM -DD hh:mm:ss
December 18, 2014
15
rwaddrcount(1)
The SiLK Reference Guide
m/d/y
Print the timestamps as MM /DD/YYYY hh:mm:ss
epoch
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK.
The timezone is one of:
utc
Use Coordinated Universal Time to print timestamps.
local
Use the TZ environment variable or the local timezone.
--ip-format=FORMAT
For the --print-recs and --print-ips output formats, specify how IP addresses will be printed. When
this switch is not specified, IPs are printed in the canonical format. The FORMAT is one of:
canonical
Print IP addresses in their canonical form, 127.0.0.1.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The address 127.0.0.1 is printed as 127.000.000.001.
decimal
Print IP addresses as integers in decimal format. The address 127.0.0.1 is printed as 2130706433.
hexadecimal
Print IP addresses as integers in hexadecimal format. The address 127.0.0.1 is printed as
7f000001.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any
IPv4 address is mapped into the ::ffff:0:0/96 netblock. The address 127.0.0.1 is printed as
::ffff:7f00:1.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release
--sort-ips
For the --print-recs and --print-ips output formats, the results are presented sorted by IP address.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
16
December 18, 2014
The SiLK Reference Guide
rwaddrcount(1)
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--print-filenames
Print to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to
the standard output as long as the --output-path switch has been used to redirect rwaddrcount’s
ASCII output.
--output-path=PATH
Determine where the output of rwaddrcount (ASCII text) is written. If this option is not given,
output is written to the standard output.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwaddrcount searches for the site configuration file in the locations specified in the FILES section.
--legacy-timestamps
--legacy-timestamps=NUM
Specify the format for human readable timestamps, either the default (new) style,
YYYY /MM /DD Thh :mm :ss , or the legacy style, MM /DD /YYYY hh :mm :ss . When this switch is
not present, the timestamps will be in the default format. When this switch is present and no
argument is given, timestamps are in the legacy format. When an argument is supplied, timestamps
will be in the new format if the argument begins with 0, and in the old format if the argument begins
with 1. Any other argument to the switch is an error.
--xargs
--xargs=FILENAME
Causes rwaddrcount to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwaddrcount will open each file in
turn and read records from it, as if the files had been listed on the command line.
December 18, 2014
17
rwaddrcount(1)
The SiLK Reference Guide
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
Deprecated Switches
The following switches are deprecated. They will be removed in SiLK 4.0.
--byte-min=BYTEMIN
Deprecated alias for --min-bytes.
--packet-min=PACKMIN
Deprecated alias for --min-packets.
--rec-min=RECMIN
Deprecated alias for --min-records.
--byte-max=BYTEMAX
Deprecated alias for --max-bytes.
--packet-max=PACKMAX
Deprecated alias for --max-packets.
--rec-max=RECMAX
Deprecated alias for --max-records.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To print out a set of IP’s with exactly one tcp record during the time period, use:
$ rwfilter --start-date=2003/09/01:00 --end-date=2003/09/01:12
--proto=6 --pass=stdout
| rwaddrcount --max-records=1 --print-ips
\
\
In general, to print out record information, use rwaddrcount with --print-recs
$ rwfilter --start-date=2003/01/17:00 --end-date=2003/01/17:23
--proto=6 --pass=stdout
| rwaddrcount --print-rec | head -3
10.10.10.1| 65792| 147|
10.10.10.2| 110744| 89|
10.10.10.3|
864| 18|
18
\
\
21| 2003/01/17T00:19:01| 2003/01/17T02:00:13|
7| 2003/01/17T01:21:42| 2003/01/17T01:39:21|
6| 2003/01/17T00:20:33| 2003/01/17T01:25:38|
December 18, 2014
The SiLK Reference Guide
rwaddrcount(1)
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwcut automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwcut does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwcut automatically invokes this program to display its output
a screen at a time.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwaddrcount may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwaddrcount may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check
the Timezone support value in the output from rwaddrcount --version), the value of the TZ environment variable determines the timezone in which rwaddrcount displays timestamps. If the TZ
environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string
causes timestamps to be displayed in UTC. The value of the TZ environment variable is ignored when
the SiLK installation uses utc. For system information on the TZ variable, see tzset(3).
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
December 18, 2014
19
rwaddrcount(1)
The SiLK Reference Guide
SEE ALSO
rwset(1), rwsetcat(1), rwstats(1), rwtotal(1), rwuniq(1), silk(7)
NOTES
rwaddrcount only supports IPv4 addresses, and it will not be modified to support IPv6 addresses. To
produce output similar to rwaddrcount for IPv6 addresses, use rwuniq(1):
rwuniq --fields=sip --values=bytes,packets,records,stime,etime
When used in an IPv6 environment, rwaddrcount converts IPv6 flow records that contain addresses in
the ::ffff:0:0/96 prefix to IPv4 and processes them. IPv6 records having addresses outside of that prefix are
ignored.
rwaddrcount uses a fairly large hashtable to store data, but it is likely that as the amount of data expands,
the application will take more time to process data.
Similar binning of records are produced by rwstats(1), rwtotal(1), and rwuniq(1).
To generate a list of IP addresses without the volume information, use rwset(1).
20
December 18, 2014
The SiLK Reference Guide
rwappend(1)
rwappend
Append SiLK Flow file(s) to an existing SiLK Flow file
SYNOPSIS
rwappend [--create=[TEMPLATE_FILE]] [--print-statistics]
[--site-config-file=FILENAME]
TARGET_FILE SOURCE_FILE [SOURCE_FILE...]
rwappend --help
rwappend --version
DESCRIPTION
rwappend reads SiLK Flow records from the specified SOURCE FILE s and appends them to the TARGET FILE. If stdin is used as the name of one of the SOURCE FILE s, SiLK flow records will be read from
the standard input.
When the TARGET FILE does not exist and the --create switch is not provided, rwappend will exit
with an error. When --create is specified and TARGET FILE does not exist, rwappend will create the
TARGET FILE using the same format, version, and byte-order as the specified TEMPLATE FILE. If no
TEMPLATE FILE is given, the TARGET FILE is created in the default format and version (the same
format that rwcat(1) would produce).
The TARGET FILE must be an actual file---it cannot be a named pipe or the standard output. In addition,
the header of TARGET FILE must not be compressed; that is, you cannot append to a file whose entire
contents has been compressed with gzip (those files normally end in the .gz extension).
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--create
--create=TEMPLATE FILE
Create the TARGET FILE if it does not exist. The file will have the same format, version, and
byte-order as the TEMPLATE FILE if it is provided; otherwise the defaults are used. The TEMPLATE FILE will NOT be appended to TARGET FILE unless it also appears in as the name of a
SOURCE FILE.
--print-statistics
Print to the standard error the number of records read from each SOURCE FILE and the total number
of records appended to the TARGET FILE.
December 18, 2014
21
rwappend(1)
The SiLK Reference Guide
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwappend searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Standard usage where the file to append to, results.rw, exists:
$ rwappend results.rw sample5.rw sample6.rw
To append files sample*.rw to results.rw, or to create results.rw using the same format as the first file
argument (note that sample1.rw must be repeated):
$ rwappend results.rw --create=sample1.rw
sample1.rw sample2.rw
\
If results.rw does not exist, the following two commands are equivalent:
$ rwappend --create results.rw sample1.rw sample2.rw
$ rwcat sample1.rw sample2.rw > results.rw
ENVIRONMENT
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwappend may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwappend may use this environment variable. See the FILES section for details.
22
December 18, 2014
The SiLK Reference Guide
rwappend(1)
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwcat(1), silk(7)
BUGS
When a SOURCE FILE contains IPv6 flow records and the TARGET FILE only supports IPv4 records,
rwappend converts IPv6 records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and writes them
to the TARGET FILE. rwappend silently ignores IPv6 records having addresses outside of that prefix.
rwappend makes some attempts to avoid appending a file to itself (which would eventually exhaust the
disk space) by comparing the names of files it is given; it should be smarter about this.
December 18, 2014
23
rwbag(1)
The SiLK Reference Guide
rwbag
Build a binary Bag from SiLK Flow records.
SYNOPSIS
rwbag [--sip-flows=OUTPUTFILE] [--dip-flows=OUTPUTFILE]
[--sport-flows=OUTPUTFILE] [--dport-flows=OUTPUTFILE]
[--proto-flows=OUTPUTFILE] [--sensor-flows=OUTPUTFILE]
[--input-flows=OUTPUTFILE] [--output-flows=OUTPUTFILE]
[--nhip-flows=OUTPUTFILE]
[--sip-packets=OUTPUTFILE] [--dip-packets=OUTPUTFILE]
[--sport-packets=OUTPUTFILE] [--dport-packets=OUTPUTFILE]
[--proto-packets=OUTPUTFILE] [--sensor-packets=OUTPUTFILE]
[--input-packets=OUTPUTFILE] [--output-packets=OUTPUTFILE]
[--nhip-packets=OUTPUTFILE]
[--sip-bytes=OUTPUTFILE] [--dip-bytes=OUTPUTFILE]
[--sport-bytes=OUTPUTFILE] [--dport-bytes=OUTPUTFILE]
[--proto-bytes=OUTPUTFILE] [--sensor-bytes=OUTPUTFILE]
[--input-bytes=OUTPUTFILE] [--output-bytes=OUTPUTFILE]
[--nhip-bytes=OUTPUTFILE]
[--note-add=TEXT] [--note-file-add=FILE]
[--print-filenames] [--copy-input=PATH]
[--compression-method=COMP_METHOD]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwbag --help
rwbag --version
DESCRIPTION
rwbag reads SiLK Flow records and builds a Bag. Source IP address, destination IP address, next hop IP
address, source port, destination port, protocol, input interface index, output interface index, or sensor ID
may be used as the unique key by which to count volumes. Flows, packets, or bytes may be used as the
counter.
rwbag reads SiLK Flow records from the files named on the command line or from the standard input when
no file names are specified and --xargs is not present. To read the standard input in addition to the named
files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as it
is read. When the --xargs switch is provided, rwbag will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
If adding a value to a key would cause the value to overflow the maximum value that Bags support, the
key’s value will be set to the maximum and processing will continue. In addition, if this is the first value to
overflow in this Bag, a warning will be printed to the standard error.
24
December 18, 2014
The SiLK Reference Guide
rwbag(1)
If rwbag runs out of memory, it will exit immediately. The output Bag files will remain behind, each with
a size of 0 bytes.
Use rwbagcat(1) to see the contents of a bag. To create a bag from textual input or from an IPset, use
rwbagbuild(1). rwbagtool(1) allows you to manipulate binary bag files.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
At least one of the following output flags must be defined. For each, OUTPUTFILE is the name of a nonexistent file, a named pipe, or the keyword stdout to write the binary Bag to the standard output. Only
one switch may use the standard output as its output stream.
--sip-flows=OUTPUTFILE
Count number of flows by unique source IP.
--sip-packets=OUTPUTFILE
Count number of packets by unique source IP.
--sip-bytes=OUTPUTFILE
Count number of bytes by unique source IP.
--dip-flows=OUTPUTFILE
Count number of flows by unique destination IP.
--dip-packets=OUTPUTFILE
Count number of packets by unique destination IP.
--dip-bytes=OUTPUTFILE
Count number of bytes by unique destination IP.
--sport-flows=OUTPUTFILE
Count number of flows by unique source port.
--sport-packets=OUTPUTFILE
Count number of packets by unique source port.
--sport-bytes=OUTPUTFILE
Count number of bytes by unique source port.
--dport-flows=OUTPUTFILE
Count number of flows by unique destination port.
--dport-packets=OUTPUTFILE
Count number of packets by unique destination port.
--dport-bytes=OUTPUTFILE
Count number of bytes by unique destination port.
December 18, 2014
25
rwbag(1)
The SiLK Reference Guide
--proto-flows=OUTPUTFILE
Count number of flows by unique protocol.
--proto-packets=OUTPUTFILE
Count number of packets by unique protocol.
--proto-bytes=OUTPUTFILE
Count number of bytes by unique protocol.
--sensor-flows=OUTPUTFILE
Count number of flows by unique sensor ID.
--sensor-packets=OUTPUTFILE
Count number of packets by unique sensor ID.
--sensor-bytes=OUTPUTFILE
Count number of bytes by unique sensor ID.
--input-flows=OUTPUTFILE
Count number of flows by unique input interface index.
--input-packets=OUTPUTFILE
Count number of packets by unique input interface index.
--input-bytes=OUTPUTFILE
Count number of bytes by unique input interface index.
--output-flows=OUTPUTFILE
Count number of flows by unique output interface index.
--output-packets=OUTPUTFILE
Count number of packets by unique output interface index.
--output-bytes=OUTPUTFILE
Count number of bytes by unique output interface index.
--nhip-flows=OUTPUTFILE
Count number of flows by unique next hop IP.
--nhip-packets=OUTPUTFILE
Count number of packets by unique next hop IP.
--nhip-bytes=OUTPUTFILE
Count number of bytes by unique next hop IP.
--note-add=TEXT
Add the specified TEXT to the header of every output file as an annotation. This switch may be
repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of every output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
26
December 18, 2014
The SiLK Reference Guide
rwbag(1)
--print-filenames
Prints to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to
the standard output as long as the --output-path switch has been used to redirect rwbag’s ASCII
output.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP
addresses contained in IPv4 flow records will be added to the bag(s).
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
mix
Process the input as a mixture of IPv4 and IPv6 flow records. When creating a bag whose key is
an IP address and the input contains IPv6 addresses outside of the ::ffff:0:0/96 prefix, this policy
is equivalent to force; otherwise it is equivalent to asv4.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow
records will be added to the bag(s).
Regardless of the IPv6 policy, when all IPv6 addresses in the bag are in the ::ffff:0:0/96 prefix, rwbag
treats them as IPv4 addresses and writes an IPv4 bag. When any other IPv6 addresses are present in
the bag, the IPv4 addresses in the bag are mapped into the ::ffff:0:0/96 prefix and rwbag writes an
IPv6 bag.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
December 18, 2014
27
rwbag(1)
The SiLK Reference Guide
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwbag searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwbag to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwbag will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To build both source IP and destination IP Bags of flows:
$ rwfilter ... --pass=stdout
| rwbag --sip-flow=sf.bag --dip-flow=df.bag
\
To build a Bag containing the number of bytes seen for each /16 prefix length of source addresses, use the
rwnetmask(1) tool prior to feeding the input to rwbag:
$ rwfilter ... --pass=stdout
| rwnetmask --4sip-prefix=16
| rwbag --sip-bytes=sf16.bag
\
\
(To print the IP addresses of an existing Bag into /16 prefixes, use the --network-structure switch of
rwbagcat(1).)
28
December 18, 2014
The SiLK Reference Guide
rwbag(1)
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwbag may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwbag may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwbagbuild(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwfilter(1), rwnetmask(1), silk(7),
zlib(3)
December 18, 2014
29
rwbagbuild(1)
The SiLK Reference Guide
rwbagbuild
Create a binary Bag from non-flow data.
SYNOPSIS
rwbagbuild { --set-input=SETFILE | --bag-input=TEXTFILE }
[--delimiter=C] [--default-count=DEFAULTCOUNT]
[--key-type=FIELD_TYPE] [--counter-type=FIELD_TYPE]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [--output-path=OUTPUTFILE]
rwbagbuild --help
rwbagbuild --version
DESCRIPTION
rwbagbuild builds a binary Bag file from an IPset file or from textual input.
When creating a Bag from an IPset, the value associated with each IP address is the value given by the
--default-count switch, or 1 if the switch isn’t provided.
The textual input read from the argument to the --bag-input switch is processed a line at a time. Comments
begin with a ’#’-character and continue to the end of the line; they are stripped from each line. Any line
that is blank or contains only whitespace is ignored. All other lines must contain a valid key or key-count
pair; whitespace around the key and count is ignored.
If the delimiter character (specified by the --delimiter switch and having pipe (’|’) as its default) is not
present, the line must contain only an IP address or an integer key. If the delimiter is present, the line must
contain an IP address or integer key before the delimiter and an integer count after the delimiter. These
lines may have a second delimiter after the integer count; the second delimiter and any text to the right of
it are ignored.
When the --default-count switch is specified, its value is used as the count for each key, and the count
value parsed from each line, if any, is ignored. Otherwise, the parsed count is used, or 1 is used as the count
if no delimiter was present.
For each key-count pair, the key is inserted into Bag with its count or, if the key is already present in the
Bag, its total count is be incremented by the count from this line. When using the --default-count switch,
the count for a key that appears in the input N times is the product of N and DEFAULTCOUNT.
The IP address or integer key must be expressed in one of the following formats. rwbagbuild complains if
the key field contains a mixture of IPv6 addresses and integer values.
• Dotted decimal---all 4 octets are required:
10.1.2.4
• An unsigned 32-bit integer:
167838212
30
December 18, 2014
The SiLK Reference Guide
rwbagbuild(1)
• An IPv6 address in canonical form (when SiLK has been compiled with IPv6 support):
2001:db8:a:1::2:4
::ffff:10.1.2.4
• Any of the above with a CIDR designation---for dotted decimal all four octets are still required:
10.1.2.4/31
167838212/31
2001:db8:a:1::2:4/127
::ffff:10.1.2.4/31
• SiLK IP wildcard notation. A SiLK IP Wildcard can represent multiple IPv4 or IPv6 addresses. An
IP Wildcard contains an IP in its canonical form, except each part of the IP (where part is an octet
for IPv4 or a hexadectet for IPv6) may be a single value, a range, a comma separated list of values
and ranges, or the letter x to signify all values for that part of the IP (that is, 0-255 for IPv4). You
may not specify a CIDR suffix when using the IP Wildcard notation.
10.x.1-2.4,5
2001:db8:a:x::1-2:4,5
If an IP address or count cannot be parsed, or if a line contains a delimiter character but no count, rwbagbuild prints an error and exits.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
The following two switches control the type of input; one and only one must be provided:
--set-input=SETFILE
Create a Bag from an IPset. SETFILE is a filename, a named pipe, or the keyword stdin or - to
read the IPset from the standard input. Counts have a volume of 1 when the --default-count switch
is not specified. (IPsets are typically created by rwset(1) or rwsetbuild(1).)
--bag-input=TEXTFILE
Create a Bag from a delimited text file. TEXTFILE is a filename, a named pipe, or the keyword stdin
or - to read the text from the standard input. See the DESCRIPTION section for the syntax of the
TEXTFILE.
--delimiter=C
The delimiter to expect between each key-count pair of the TEXTFILE read by the --bag-input
switch. The delimiter is ignored if the --set-input switch is specified. Since ’#’ is used to denote
comments and newline is used to used to denote records, neither is a valid delimiter character.
--default-count=DEFAULTCOUNT
Override the counts of all values in the input text or IPset with the value of DEFAULTCOUNT.
DEFAULTCOUNT must be a positive integer.
December 18, 2014
31
rwbagbuild(1)
The SiLK Reference Guide
--key-type=FIELD TYPE
Write a entry into the header of the Bag file that specifies the key contains FIELD TYPE values.
When this switch is not specified, the key type of the Bag is set to custom. The FIELD TYPE is case
insensitive. The supported FIELD TYPE s are:
sIPv4
source IP address, IPv4 only
dIPv4
destination IP address, IPv4 only
sPort
source port
dPort
destination port
protocol
IP protocol
packets
packets, see also sum-packets
bytes
bytes, see also sum-bytes
flags
bitwise OR of TCP flags
sTime
starting time of the flow record, seconds resolution
duration
duration of the flow record, seconds resolution
eTime
ending time of the flow record, seconds resolution
sensor
sensor ID
input
SNMP input
output
SNMP output
nhIPv4
next hop IP address, IPv4 only
initialFlags
TCP flags on first packet in the flow
sessionFlags
bitwise OR of TCP flags on all packets in the flow except the first
attributes
flow attributes set by the flow generator
application
guess as to the content of the flow, as set by the flow generator
32
December 18, 2014
The SiLK Reference Guide
rwbagbuild(1)
class
class of the sensor
type
type of the sensor
icmpTypeCode
an encoded version of the ICMP type and code, where the type is in the upper byte and the code
is in the lower byte
sIPv6
source IP, IPv6
dIPv6
destination IP, IPv6
nhIPv6
next hop IP, IPv6
records
count of flows
sum-packets
sum of packet counts
sum-bytes
sum of byte counts
sum-duration
sum of duration values
any-IPv4
a generic IPv4 address
any-IPv6
a generic IPv6 address
any-port
a generic port
any-snmp
a generic SNMP value
any-time
a generic time value, in seconds resolution
custom
a number
--counter-type=FIELD TYPE
Write a entry into the header of the Bag file that specifies the counter contains FIELD TYPE values.
When this switch is not specified, the counter type of the Bag is set to custom. The supported
FIELD TYPE s are the same as those for the key.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
December 18, 2014
33
rwbagbuild(1)
The SiLK Reference Guide
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--output-path=OUTPUTFILE
Redirect output to OUTPUTFILE. OUTPUTFILE is a filename, a named pipe, or the keyword stdout
or - to write the bag to the standard output.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Create a bag with IP addresses as keys from a text file
Assume the file mybag.txt contains the following lines, where each line contains an IP address, a comma as
a delimiter, a count, and ends with a newline.
34
December 18, 2014
The SiLK Reference Guide
rwbagbuild(1)
192.168.0.1,5
192.168.0.2,500
192.168.0.3,3
192.168.0.4,14
192.168.0.5,5
To build a bag with it:
$ rwbagbuild --bag-input=mybag.txt --delimiter=, > mybag.bag
Use rwbagcat(1) to view its contents:
$ rwbagcat mybag.bag
192.168.0.1|
192.168.0.2|
192.168.0.3|
192.168.0.4|
192.168.0.5|
5|
500|
3|
14|
5|
Create a bag with protocols as keys from a text file
To create a Bag of protocol data from the text file myproto.txt:
1|
6|
17|
4|
138|
131|
use
$ rwbagbuild --key-type=proto --bag-input=myproto.txt > myproto.bag
$ rwbagcat myproto.bag
1|
4|
6|
138|
17|
131|
When the --key-type switch is specified, rwbagcat knows the keys should be printed as integers, and
rwfileinfo(1) shows the type of the key:
$ rwfileinfo --fields=bag myproto.bag
myproto.bag:
bag
key: protocol @ 4 octets; counter: custom @ 8 octets
Without the --key-type switch, rwbagbuild assumes the integers in myproto.txt represent IP addresses:
$ rwbagbuild --bag-input=myproto.txt | rwbagcat
0.0.0.1|
4|
0.0.0.6|
138|
0.0.0.17|
131|
Although the --integer-keys switch on rwbagcat forces it to print keys as integers, it is generally better
to use the --key-type switch when creating the bag.
$ rwbagbuild --bag-input=myproto.txt | rwbagcat --integer-keys 1| 4| 6| 138| 17| 131|
December 18, 2014
35
rwbagbuild(1)
The SiLK Reference Guide
Create a bag and override the existing counter
To ignore the counts that exist in myproto.txt and set the counts for each protocol to 1, use the --defaultcount switch which overrides the existing value:
$ rwbagbuild --key-type=protocol --bag-input=myproto.txt
--default-count=1 --output-path=myproto1.bag
$ rwbagcat myproto1.bag
1|
1|
6|
1|
17|
1|
\
Create a bag with IP addresses as keys from an IPset file
Given the IP set myset.set, create a bag where every entry in the bag has a count of 3:
$ rwbagbuild --set-input=myset.set --default-count=3
--out=mybag2.bag
\
Create a bag from multiple input files
Suppose we have three IPset files, A.set, B.set, and C.set:
$ rwsetcat A.set
10.0.0.1
10.0.0.2
$ rwsetcat B.set
10.0.0.2
10.0.0.3
$ rwsetcat C.set
10.0.0.1
10.0.0.2
10.0.0.4
We want to create a bag file from these IPset files where the count for each IP address is the number of files
that IP appears in. rwbagbuild accepts a single file as an argument, so we cannot do the following:
$ rwbagbuild --set-input=A.set --set-input=B.set ...
# WRONG!
(Even if we could repeat the --set-input switch, specifying it multiple times would be annoying if we had
300 files instead of only 3.)
The IPset files are (mathematical) sets, so if we join them together first with rwsettool(1) and then run
rwbagbuild, each IP address gets a count of 1:
$ rwsettool --union A.set B.set C.set
| rwbagbuild --set-input=| rwbagcat
10.0.0.1|
1|
36
\
\
December 18, 2014
The SiLK Reference Guide
10.0.0.2|
10.0.0.3|
10.0.0.4|
rwbagbuild(1)
1|
1|
1|
When rwbagbuild is processing textual input, it sums the counters for keys that appear in the input
multiple times. We can use rwsetcat(1) to convert each IPset file to text and feed that as single textual
stream to rwbagbuild. Use the --cidr-blocks switch on rwsetcat to reduce the amount of input that
rwbagbuild must process. This is probably the best approach to the problem:
$ rwsetcat --cidr-block *.set | rwbagbuild --bag-input=- > total1.bag
$ rwbagcat total1.bag
10.0.0.1|
2|
10.0.0.2|
3|
10.0.0.3|
1|
10.0.0.4|
1|
A less efficient solution is to convert each IPset to a bag and then use rwbagtool(1) to add the bags
together:
$ for i in *.set ; do
rwbagbuild --set-input=$i --output-file=/tmp/$i.bag ;
done
$ rwbagtool --add /tmp/*.set.bag > total2.bag
$ rm /tmp/*.set.bag
There is no need to create a bag file for each IPset; we can get by with only two bag files, the final bag
file, total3.bag, and a temporary file, tmp.bag. We initialize total3.bag to an empty bag. As we loop over
each IPset, rwbagbuild converts the IPset to a bag on its standard output, rwbagtool creates tmp.bag by
adding its standard input to total3.bag, and we rename tmp.bag to total3.bag:
$ rwbagbuild --bag-input=/dev/null --output-file=total3.bag
$ for i in *.set ; do
rwbagbuild --set-input=$i \
| rwbagtool --output-file=tmp.bag --add total3.bag stdin ;
/bin/mv tmp.bag total3.bag ;
done
$ rwbagcat total3.bag
10.0.0.1|
2|
10.0.0.2|
3|
10.0.0.3|
1|
10.0.0.4|
1|
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
December 18, 2014
37
rwbagbuild(1)
The SiLK Reference Guide
SEE ALSO
rwbag(1), rwbagcat(1), rwbagtool(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), silk(7), zlib(3)
BUGS
The --default-count switch is poorly named.
38
December 18, 2014
The SiLK Reference Guide
rwbagcat(1)
rwbagcat
Output a binary Bag as text.
SYNOPSIS
rwbagcat [ --network-structure[=STRUCTURE] | --bin-ips[=SCALE] ]
[--print-statistics[=OUTFILE]]
[--minkey=VALUE] [--maxkey=VALUE] [--mask-set=PATH]
[--mincounter=VALUE] [--maxcounter=VALUE] [--zero-counts]
[--output-path=OUTPUTFILE]
[--key-format=FORMAT] [--integer-keys] [--zero-pad-ips]
[--no-columns] [--column-separator=C]
[--no-final-delimiter] [{--delimited | --delimited=C}]
[--pager=PAGER_PROG] [BAGFILE...]
rwbagcat --help
rwbagcat --version
DESCRIPTION
rwbagcat reads a binary Bag as created by rwbag(1) or rwbagbuild(1), converts it to text, and outputs
it to the standard output or the specified file. It can also print various statistics and summary information
about the Bag.
rwbagcat reads the BAGFILE s specified on the command line; if no BAGFILE arguments are given,
rwbagcat attempts to read the Bag from the standard input. BAGFILE may also explicitly be the keyword
stdin or a hyphen (-) to allow rwbagcat to combine files and piped input. If any input does not contain
a Bag, rwbagcat prints an error to the standard error and exits abnormally.
When multiple BAGFILE s are specified, each is handled individually; to process the combination of the
BAGFILE s, invoke rwbagcat on the output from rwbagtool(1).
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--network-structure
--network-structure=STRUCTURE
For each numeric value in STRUCTURE, group the IPs in the Bag into a netblock of that size and print
the number of hosts, the sum of the counters, and, optionally, print the number of smaller, occupied
netblocks that each larger netblock contains. When STRUCTURE begins with v6:, the IPs in the
Bag are treated as IPv6 addresses, and any IPv4 addresses are mapped into the ::ffff:0:0/96 netblock.
Otherwise, the IPs are treated as IPv4 addresses, and any IPv6 address outside the ::ffff:0:0/96 netblock
is ignored. Aside from the initial v6: (or v4:, for consistency), STRUCTURE has one of following
forms:
December 18, 2014
39
rwbagcat(1)
The SiLK Reference Guide
1. NETBLOCK LIST /SUMMARY LIST. Group IPs into the sizes specified in either NETBLOCK LIST or SUMMARY LIST. rwbagcat prints a row for each occupied netblock specified
in NETBLOCK LIST, where the row lists the base IP of the netblock, the sum of the counters for
that netblock, the number of hosts, and the number of smaller, occupied netblocks having a size
that appears in either NETBLOCK LIST or SUMMARY LIST. (The values in SUMMARY LIST
are only summarized; they are not printed.)
2. NETBLOCK LIST /. Similar to the first form, except all occupied netblocks are printed, and
there are no netblocks that are only summarized.
3. NETBLOCK LIST S. When the character S appears anywhere in the NETBLOCK LIST, rwbagcat provides a default value for the SUMMARY LIST. That default is 8,16,24,27 for IPv4,
and 48,64 for IPv6.
4. NETBLOCK LIST. When neither S nor / appear in STRUCTURE, the output does not include
the number of smaller, occupied netblocks.
5. Empty. When STRUCTURE is empty or only contains v6: or v4:, the NETBLOCK LIST prints
a single row for the total network (the /0 netblock) giving the number of hosts, the sum of the
counters, and the number of smaller, occupied netblocks using the same default list specified in
form 3.
NETBLOCK LIST and SUMMARY LIST contain a comma separated list of numbers between 0 (the
total network) and the size for an individual host (32 for IPv4 or 128 for IPv6). The characters T and H
may be used as aliases for 0 and the host netblock, respectively. In addition, when parsing the lists as
IPv4 netblocks, the characters A, B, C, and X are supported as aliases for 8, 16, 24, and 27, respectively.
A comma is not required between adjacent letters. The --network-structure switch disables printing
of the IPs in the Bag file; specify the H argument to the switch to print each individual IP address and
its counter.
--bin-ips
--bin-ips=SCALE
Invert the bag and count the total number of unique IP addresses for a given value of the volume bin.
For example, turn a Bag {sip:flow} into {flow:count(sip)}. SCALE is a string containing the value
linear, binary, or decimal.
• The default behavior is linear: Each distinct counter gets its own bin. Any counter in the input
Bag file that is larger than the maximum possible key will be attributed to the maximum key; to
prevent this, specify --maxcounter=4294967295.
• binary creates a bag of {log2(flow):count(sip)}.
[ 2^n, 2^(n+1) ).
Bin n contains counts in the range
• decimal creates one hundred bins for each counter in the range [1,100), and one hundred bins for
each counter in the range [100,1000), each counter in the range [1000,10000), etc. Counters are
logarithmically distributed among the bins.
--print-statistics
--print-statistics=OUTFILE
Print out breakdown of the network hosts seen, and print out general statistics about the keys and
counters.
• count of unique keys
• sum of all the counters
• minimum key
40
December 18, 2014
The SiLK Reference Guide
rwbagcat(1)
• maximum key
• minimum counter
• maximum counter
• mean of counters
• variance of counters
• standard deviation of counters
• skew of counters
• kurtosis of counters
• count of nodes allocated
• total bytes allocated for nodes
• count of leaves allocated
• total bytes allocated for leaves
• density of the data
OUTFILE is a filename, named pipe, the values stdout or - to print to the standard output, or the
value stderr to print to the standard error. Defaults to printing to the standard output.
--minkey=VALUE
Output records whose key value is at least VALUE. VALUE may be an IP address or an integer in the
range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.
--maxkey=VALUE
Output records whose key value is not more than VALUE. VALUE may be an IP address or an integer
in the range 0 to 4294967295 inclusive. The default is to print all records with a non-zero counter.
--mask-set=PATH
Output records whose key appears in the binary IPset read from the file PATH. (To build an IPset, use
rwset(1) or rwsetbuild(1).) When used with --minkey and/or --maxkey, output records whose
key is in the IPset and is also within when the specified range.
--mincounter=VALUE
Output records whose counter value is at least VALUE. VALUE is an integer in the range 1 to
18446744073709551615. The default is to print all records with a non-zero counter; use --zero-counts
to show records whose counter is 0.
--maxcounter=VALUE
Output records whose counter value is not more than VALUE. VALUE is an integer in the range 1 to
18446744073709551615, with the default being the maximum counter value.
--zero-counts
Print keys whose counter is zero. Normally, keys with a counter of zero are suppressed since all keys
have a default counter of zero. In order to use this flag, either --mask-set or both --minkey and
--maxkey must be specified. When this switch is specified, any counter limit explicitly set by the
--maxcounter switch will still be applied.
--output-path=OUTPUTFILE
Redirect output of the --network-structure or --bin-ips options to OUTPUTFILE. OUTPUTFILE
is a filename, named pipe, or the values stdout or - to print to the standard output.
December 18, 2014
41
rwbagcat(1)
The SiLK Reference Guide
--key-format=FORMAT
Specify the format to use when printing the keys. When this switch is not specified, a Bag whose keys
are known not to be IP addresses are printed as decimal numbers, and the keys for all other Bags are
printed as IP addresses in the canonical format. The FORMAT is one of:
canonical
Print keys as IP addresses in the canonical format: dotted quad for IPv4 (127.0.0.1) and hexadectet for IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses
in ::/96 will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print keys as IP addresses in their canonical form, but add zeros to the output so it fully fills the
width of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001
and 2001:0db8:0000:0000:0000:0000:0000:0001, respectively.
decimal
Print keys as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are printed
as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print keys as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all keys as IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any
integer key or IPv4 address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1
and 2001:db8::1 are printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-keys
This switch is equivalent to --key-format=decimal, it is deprecated as of SiLK 3.7.0, and it will be
removed in the SiLK 4.0 release.
--zero-pad-ips
This switch is equivalent to --key-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it
will be removed in the SiLK 4.0 release.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed. When the
network summary is requested (--network-structure=S), the separator is always printed before the
summary column and never that column.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
42
December 18, 2014
The SiLK Reference Guide
rwbagcat(1)
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
To print the bag:
$ rwbagcat mybag.bag
172.23.1.1|
172.23.1.2|
172.23.1.3|
172.23.1.4|
192.168.0.100|
192.168.0.101|
192.168.0.160|
192.168.20.161|
192.168.20.162|
192.168.20.163|
5|
231|
9|
19|
1|
1|
15|
1|
5|
5|
To print it with full network:
$ rwbagcat --network-structure=TABCHX mybag.bag
172.23.1.1
|
5|
172.23.1.2
|
231|
172.23.1.3
|
9|
172.23.1.4
|
19|
172.23.1.0/27
|
264|
172.23.1.0/24
|
264|
172.23.0.0/16
|
264|
172.0.0.0/8
|
264|
192.168.0.100
|
1|
192.168.0.101
|
1|
192.168.0.96/27
|
2|
192.168.0.160
|
15|
192.168.0.160/27 |
15|
192.168.0.0/24
|
17|
192.168.20.161 |
1|
192.168.20.162 |
5|
192.168.20.163 |
5|
December 18, 2014
43
rwbagcat(1)
The SiLK Reference Guide
192.168.20.160/27
192.168.20.0/24
192.168.0.0/16
192.0.0.0/8
TOTAL
|
|
|
|
|
11|
11|
28|
28|
292|
Or an abbreviated network structure by class A and C only, including summary information:
$ rwbagcat --network-structure=ACS mybag.bag
172.23.1.0/24
|
264| 4 hosts
172.0.0.0/8
|
264| 4 hosts
192.168.0.0/24
|
17| 3 hosts
192.168.20.0/24
|
11| 3 hosts
192.0.0.0/8
|
28| 6 hosts
in
in
in
in
in
1
1
2
1
1
/27
/16, 1 /24, and 1 /27
/27s
/27
/16, 2 /24s, and 3 /27s
To bin by number of unique IP addresses by volume:
$ rwbagcat --bin-ips mybag.bag
1|
3|
5|
3|
9|
1|
15|
1|
19|
1|
231|
1|
This means there were 3 source hosts in the bag that had a single flow; 3 hosts that had 5 flows; and one
host each that had 9, 15, 19, and 231 flows.
For a log2 breakdown of the counts:
$ rwbagcat --bin-ips=binary mybag.bag
2^0 to 2^1-1|
3|
2^2 to 2^3-1|
3|
2^3 to 2^4-1|
2|
2^4 to 2^5-1|
1|
2^7 to 2^8-1|
1|
Statistics:
$ rwbagcat --stats mybag.bag
Statistics
keys:
sum of counters:
minimum key:
maximum key:
minimum count:
maximum count:
mean:
variance:
44
10
292
172.23.1.1
192.168.20.163
1
231
29.2
5064
December 18, 2014
The SiLK Reference Guide
standard deviation:
skew:
kurtosis:
rwbagcat(1)
71.16
2.246
8.1
$ rwbagcat --tree-stats mybag.bag
nodes allocated: 5 (10240 bytes)
leaves allocated: 4 (1024 bytes)
keys inserted: 10 (10 unique)
counter density: 7.81%
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK PAGER
When set to a non-empty string, rwbagcat automatically invokes this program to display its output
a screen at a time. If set to an empty string, rwbagcat does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwbagcat automatically invokes this program to display its
output a screen at a time.
SEE ALSO
rwbag(1), rwbagbuild(1), rwbagtool(1), rwset(1), rwsetbuild(1), silk(7)
December 18, 2014
45
rwbagtool(1)
The SiLK Reference Guide
rwbagtool
Perform high-level operations on binary Bag files
SYNOPSIS
rwbagtool { --add | --subtract | --minimize | --maximize
| --divide | --scalar-multiply=VALUE
| --compare={lt | le | eq | ge | gt} }
[--intersect=SETFILE | --complement-intersect=SETFILE]
[--mincounter=VALUE] [--maxcounter=VALUE]
[--minkey=VALUE] [--maxkey=VALUE]
[--invert] [--coverset] [--output-path=OUTPUTFILE]
[--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[BAGFILE[ BAGFILE...]]
rwbagtool --help
rwbagtool --version
DESCRIPTION
rwbagtool performs various operations on Bags. It can add Bags together, subtract a subset of data from
a Bag, perform key intersection of a Bag with an IP set, extract the key list of a Bag as an IP set, or filter
Bag records based on their counter value.
BAGFILE is a the name of a file or a named pipe, or the names stdin or - to have rwbagtool read from
the standard input. If no Bag file names are given on the command line, rwbagtool attempts to read a
Bag from the standard input. If BAGFILE does not contain a Bag, rwbagtool prints an error to stderr
and exits abnormally.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
Operation switches
The first set of options are mutually exclusive; only one may be specified. If none are specified, the counters
in the Bag files are summed.
--add
Sum the counters for each key for all Bag files given on the command line. If a key does not exist, it
has a counter of zero. If no other operation is specified, the add operation is the default.
46
December 18, 2014
The SiLK Reference Guide
rwbagtool(1)
--subtract
Subtract from the first Bag file all subsequent Bag files. If a key does not appear in the first Bag file,
rwbagtool assumes it has a value of 0. If any counter subtraction results in a negative number, the
key will not appear in the resulting Bag file.
--minimize
Cause the output to contain the minimum counter seen for each key. Keys that do not appear in all
input Bags will not appear in the output.
--maximize
Cause the output to contain the maximum counter seen for each key. The output will contain each
key that appears in any input Bag.
--divide
Divide the first Bag file by the second Bag file. It is an error if more than two Bag files are specified.
Every key in the first Bag file must appear in the second file; the second Bag may have keys that
do not appear in the first, and those keys will not appear in the output. Since Bags do not support
floating point numbers, the result of the division is rounded to the nearest integer (values ending in .5
are rounded up). If the result of the division is less than 0.5, the key will not appear in the output.
--scalar-multiply=VALUE
Multiply each counter in the Bag file by the scalar VALUE, where VALUE is an integer in the range
1 to 18446744073709551615. This switch accepts a single Bag as input.
--compare=OPERATION
Compare the key/counter pairs in exactly two Bag files. It is an error if more than two Bag files are
specified. The keys in the output Bag will only be those whose counter in the first Bag is OPERATION
the counter in the second Bag. The counters for all keys in the output will be 1. Any key that does
not appear in both input Bag files will not appear in the result. The possible OPERATION values
are the strings:
lt
GetCounter(Bag1, key) < GetCounter(Bag2, key)
le
GetCounter(Bag1, key) <= GetCounter(Bag2, key)
eq
GetCounter(Bag1, key) == GetCounter(Bag2, key)
ge
GetCounter(Bag1, key) >= GetCounter(Bag2, key)
gt
GetCounter(Bag1, key) > GetCounter(Bag2, key)
Masking/Limiting switches
The result of the above operation is an intermediate Bag file. The following switches are applied next to
remove entries from the intermediate Bag:
December 18, 2014
47
rwbagtool(1)
The SiLK Reference Guide
--intersect=SETFILE
Mask the keys in the intermediate Bag using the set in SETFILE. SETFILE is the name of a file or a
named pipe containing an IPset, or the name stdin or - to have rwbagtool read the IPset from the
standard input. If SETFILE does not contain an IPset, rwbagtool prints an error to stderr and exits
abnormally. Only key/counter pairs where the key matches an entry in SETFILE are written to the
output. (IPsets are typically created by rwset(1) or rwsetbuild(1).)
--complement-intersect=SETFILE
As --intersect, but only writes key/counter pairs for keys which do not match an entry in SETFILE.
--mincounter=VALUE
Cause the output to contain only those records whose counter value is VALUE or higher. The allowable
range is 1 to the maximum counter value; the default is 1.
--maxcounter=VALUE
Cause the output to contain only those records whose counter value is VALUE or lower. The allowable
range is 1 to the maximum counter value; the default is the maximum counter value.
--minkey=VALUE
Cause the output to contain only those records whose key value is VALUE or higher. Default is 0 (or
0.0.0.0). Accepts input as an integer or as an IP address in dotted decimal notation.
--maxkey=VALUE
Cause the output to contain only those records whose key value is VALUE or higher. Default is
4294967295 (or 255.255.255.255). Accepts input as an integer or as an IP address in dotted decimal
notation.
Output switches
The following switches control the output.
--invert
Generate a new Bag whose keys are the counters in the intermediate Bag and whose counter is the
number of times the counter was seen. For example, this turns the Bag {sip:flow} into the Bag
{flow:count(sip)}. Any counter in the intermediate Bag that is larger than the maximum possible key
will be attributed to the maximum key; to prevent this, specify --maxcounter=4294967295.
--coverset
Instead of creating a Bag file as the output, write an IPset which contains the keys contained in the
intermediate Bag.
--output-path=OUTPUTFILE
Redirect output to OUTPUTFILE. OUTPUTFILE is the name of a file or a named pipe, or the name
stdout or - to write the result to the standard output.
--note-strip
Do not copy the notes (annotations) from the input files to the output file. Normally notes from the
input files are copied to the output.
48
December 18, 2014
The SiLK Reference Guide
rwbagtool(1)
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
The examples assume the following contents for the files:
Bag1.bag
3| 10|
4|
7|
6| 14|
7| 23|
8|
2|
Bag2.bag
1|
1|
4|
2|
7| 32|
8|
2|
December 18, 2014
Bag3.bag
2|
8|
4| 10|
6| 14|
7| 12|
9|
8|
Bag4.bag
1|
1|
4|
3|
6|
4|
7|
4|
8|
6|
Mask.set
2
4
6
8
49
rwbagtool(1)
The SiLK Reference Guide
Adding Bag files
$ rwbagtool --add Bag1.bag Bag2.bag > Bag-sum.bag
$ rwbagcat --integer-keys Bag-sum.bag
1|
1|
3| 10|
4|
9|
6| 14|
7| 55|
8|
4|
$ rwbagtool --add Bag1.bag Bag2.bag Bag3.bag > Bag-sum2.bag
$ rwbagcat --integer-keys Bag-sum2.bag
1|
1|
2|
8|
3| 10|
4| 19|
6| 28|
7| 67|
8|
4|
9|
8|
Subtracting Bag Files
$ rwbagtool --sub Bag1.bag Bag2.bag > Bag-diff.bag
$ rwbagcat --integer-keys Bag-diff.bag
3| 10|
4|
5|
6| 14|
$ rwbagtool --sub Bag2.bag Bag1.bag > Bag-diff2.bag
$ rwbagcat --integer-keys Bag-diff2.bag
1|
1|
7|
9|
Getting the Minimum Value
$ rwbagtool --minimize Bag1.bag Bag2.bag Bag3.bag > Bag-min.bag
$ rwbagcat --integer-keys Bag-min.bag
4|
2|
7| 12|
Getting the Maximum Value
$ rwbagtool --maximize Bag1.bag Bag2.bag Bag3.bag > Bag-max.bag
$ rwbagcat --integer-keys Bag-max.bag
1|
1|
2|
8|
3| 10|
50
December 18, 2014
The SiLK Reference Guide
4|
6|
7|
8|
9|
rwbagtool(1)
10|
14|
32|
2|
8|
Dividing Bag Files
$ rwbagtool --divide Bag2.bag Bag4.bag > Bag-div1.bag
$ rwbagcat --integer-keys Bag-div1.bag
1|
1|
4|
1|
7|
8|
However, when the order is reversed:
$ rwbagtool --divide Bag4.bag Bag2.bag > Bag-div2.bag
rwbagtool: Error dividing bags; key 6 not in divisor bag
To work around this issue, use the --coverset switch to create a copy of Bag4.bag that contains only the
keys in Bag2.bag
$
$
$
$
rwbagtool --coverset Bag2.bag > Bag2-keys.set
rwbagtool --intersect=Bag2-keys.set Bag4.bag > Bag4-small.bag
rwbagtool --divide Bag4-small.bag Bag2.bag > Bag-div2.bag
rwbagcat --integer-keys Bag-div2.bag
1|
1|
4|
2|
8|
3|
Or, in a single piped command without writing the IPset to disk:
$ rwbagtool --coverset Bag2.bag
| rwbagtool --intersect=- Bag4.bag
| rwbagtool --divide - Bag2.bag
| rwbagcat --integer-keys
1|
1|
4|
2|
8|
3|
\
\
\
Scalar Multiplication
$ rwbagtool --scalar-multiply=7 Bag1.bag > Bag-multiply.bag
$ rwbagcat --integer-keys Bag-multiply.bag
3| 70|
4| 49|
6| 98|
7| 161|
8| 14|
December 18, 2014
51
rwbagtool(1)
The SiLK Reference Guide
Comparing Bag Files
$ rwbagtool --compare=lt Bag1.bag Bag2.bag > Bag-lt.bag
$ rwbagcat --integer-keys Bag-lt.bag
7|
1|
$ rwbagtool --compare=le Bag1.bag Bag2.bag > Bag-le.bag
$ rwbagcat --integer-keys Bag-le.bag
7|
1|
8|
1|
$ rwbagtool --compare=eq Bag1.bag Bag2.bag > Bag-eq.bag
$ rwbagcat --integer-keys Bag-eq.bag
8|
1|
$ rwbagtool --compare=ge Bag1.bag Bag2.bag > Bag-ge.bag
$ rwbagcat --integer-keys Bag-ge.bag
4|
1|
8|
1|
$ rwbagtool --compare=gt Bag1.bag Bag2.bag > Bag-gt.bag
$ rwbagcat --integer-keys Bag-gt.bag
4|
1|
Making a Cover Set
$ rwbagtool --coverset Bag1.bag Bag2.bag Bag3.bag > Cover.set
$ rwsetcat --integer-keys Cover.set
1
2
3
4
6
7
8
9
Inverting a Bag
$ rwbagtool --invert Bag1.bag > Bag-inv1.bag
$ rwbagcat --integer-keys Bag-inv1.bag
2|
1|
7|
1|
10|
1|
14|
1|
23|
1|
$ rwbagtool --invert Bag2.bag > Bag-inv2.bag
$ rwbagcat --integer-keys Bag-inv2.bag
52
December 18, 2014
The SiLK Reference Guide
1|
2|
32|
rwbagtool(1)
1|
2|
1|
$ rwbagtool --invert Bag3.bag > Bag-inv3.bag
$ rwbagcat --integer-keys Bag-inv3.bag
8|
2|
10|
1|
12|
1|
14|
1|
Masking Bag Files
$ rwbagtool --intersect=Mask.set Bag1.bag > Bag-mask.bag
$ rwbagcat --integer-keys Bag-mask.bag
4|
7|
6| 14|
8|
2|
$ rwbagtool --complement-intersect=Mask.set Bag1.bag > Bag-mask2.bag
$ rwbagcat --integer-keys Bag-mask2.bag
3| 10|
7| 23|
Restricting the Output
$ rwbagtool --add --maxkey=5 Bag1.bag Bag2.bag > Bag-res1.bag
$ rwbagcat --integer-keys Bag-res1.bag
1|
1|
3| 10|
4|
9|
$ rwbagtool --minkey=3 --maxkey=6 Bag1.bag > Bag-res2.bag
$ rwbagcat --integer-keys Bag-res2.bag
3| 10|
4|
9|
6| 14|
$ rwbagtool --mincounter=20 Bag1.bag Bag2.bag > Bag-res3.bag
$ rwbagcat --integer-keys Bag-res3.bag
7| 55|
$ rwbagtool --sub --maxcounter=9 Bag1.bag Bag2.bag > Bag-res4.bag
$ rwbagcat --integer-keys Bag-res4.bag
4|
5|
December 18, 2014
53
rwbagtool(1)
The SiLK Reference Guide
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SEE ALSO
rwbag(1), rwbagbuild(1), rwbagcat(1), rwfileinfo(1), rwset(1), rwsetbuild(1), rwsetcat(1),
silk(7), zlib(3)
54
December 18, 2014
The SiLK Reference Guide
rwcat(1)
rwcat
Concatenate SiLK Flow files into single stream
SYNOPSIS
rwcat [--output-path=FILE] [--note-add=TEXT] [--note-file-add=FILE]
[--print-filenames] [--byte-order={big | little | native}]
[--ipv4-output] [--compression-method=COMP_METHOD]
[--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE...]]}
rwcat --help
rwcat --version
DESCRIPTION
rwcat reads SiLK Flow records and writes the records in the standard binary SiLK format to the specified
output-path; rwcat will write the records to the standard output when stdout is not the terminal and
--output-path is not provided.
rwcat reads SiLK Flow records from the files named on the command line or from the standard input when
no file names are specified and --xargs is not present. To read the standard input in addition to the named
files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as it
is read. When the --xargs switch is provided, rwcat will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
rwcat does not copy the invocation history and annotations (notes) from the header(s) of the source file(s)
to the destination file. The --note-add or --note-file-add switch may be used to add a new annotation to
the destination file.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--output-path=FILE
Write the SiLK Flow records to FILE, which must not exist. If the switch is not provided or if FILE
is stdout, flows are written to the standard output. If the name ends in .gz, the output will be
compressed using gzip(1).
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
December 18, 2014
55
rwcat(1)
The SiLK Reference Guide
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--byte-order=ENDIAN
Set the byte order for the output SiLK Flow records. The argument is one of the following:
native
Use the byte order of the machine where rwcat is running. This is the default.
big
Use network byte order (big endian) for the output.
little
Write the output in little endian format.
--ipv4-output
Force the output to contain only IPv4 flow records. When this switch is specified, IPv6 flow records
that contain addresses in the ::ffff:0:0/96 prefix are converted to IPv4 and written to the output, and
all other IPv6 records are ignored. When SiLK has not been compiled with IPv6 support, rwcat acts
as if this switch were always in effect.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--print-filenames
Print the names of input files and the number of records each file contains as the files are read.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwcat searches for the site configuration file in the locations specified in the FILES section.
--xargs
56
December 18, 2014
The SiLK Reference Guide
rwcat(1)
--xargs=FILENAME
Causes rwcat to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwcat will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To combine the results of several rwfilter(1) runs---stored in the files run1.rw, run2.rw, ... runN.rw --together to create the file combined.rw, you can use:
$ rwcat --output=combined.rw
*.rw
If the shell complains about too many arguments, you can use the UNIX find(1) function and pipe its
output to rwcat:
$ find . -name ’*.rw’ -print
| rwcat --xargs --output=combined.rw
\
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwcat may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwcat may use this environment variable. See the FILES section for details.
December 18, 2014
57
rwcat(1)
The SiLK Reference Guide
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwfilter(1), rwfileinfo(1), silk(7), gzip(1), find(1), zlib(3)
BUGS
Although rwcat will read from the standard input, this feature should be used with caution. rwcat will
treat the standard input as a single file, as it has no way to know when one file ends and the next begins.
The following will not work:
$ cat run1.rw run2.rw | rwcat --output=combined.rw
# WRONG!
The header of run2.rw will be treated as data of run1.rw, resulting in corrupt output.
58
December 18, 2014
The SiLK Reference Guide
rwcombine(1)
rwcombine
Combine flows denoting a long-lived session into a single flow
SYNOPSIS
rwcombine [--actions=ACTIONS] [--ignore-fields=FIELDS]
[--max-idle-time=NUM]
[{--print-statistics | --print-statistics=FILENAME}]
[--temp-directory=DIR_PATH] [--buffer-size=SIZE]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [--print-filenames]
[--output-path=PATH] [--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcombine --help
rwcombine --help-fields
rwcombine --version
DESCRIPTION
rwcombine reads SiLK Flow records from one or more input sources, searches for flow records where the
attributes field denotes records that were prematurely created or were continuations of prematurely created
flows, and attempts to combine those records into a single record. All the unmodified SiLK records and the
combined records are written to the file specified by the --output-path switch or to the standard output
when the --output-path switch is not provided and the standard output is not connected to a terminal.
Some flow exporters, such as yaf(1), provide fields that describe characteristics about the flow record, and
these characteristics are stored in the attributes field of SiLK Flow records. The two flags that rwcombine
considers are:
T
The flow generator prematurely created a record for a long-lived session due to the connection’s lifetime
reaching the active timeout of the flow generator. (Also, when yaf is run with the --silk switch, it
prematurely creates a flow and marks it with T if the byte count of the flow cannot be stored in a
32-bit value.)
C
The flow generator created this flow as a continuation of long-running connection, where the previous
flow for this connection met a timeout. (yaf only sets this flag when it is invoked with the --silk
switch.)
A very long-running session may be represented by multiple flow records, where the first record is marked
with the T flag, the final record is marked with the C flag, and intermediate records are marked with both C
(this record continues an earlier flow) and T (this record also met the active time-out). rwcombine attempts
to combine these multiple flow records into a single record.
December 18, 2014
59
rwcombine(1)
The SiLK Reference Guide
The input to rwcombine does not need to be sorted. As part of its processing, rwcombine may re-order
the records before writing them.
rwcombine reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed
as it is read. When the --xargs switch is provided, rwcombine will read the names of the files to process
from the named text file, or from the standard input if no file name argument is provided to the switch. The
input to --xargs must contain one file name per line.
Algorithm
The algorithm rwcombine uses to combine records is
1. rwcombine reads SiLK flow records, examines the attributes field on each record, and immediately
writes to the destination stream all records where both the time-out flag (T) and the continuation flag
(C) are not set. Records where one or both of those flags are set are stored until all input records have
been read.
2. rwcombine groups the stored records into bins where the following fields for each record in each bin
are identical: sIP, dIP, sPort, dPort, protocol, sensor, in, out, nhIP, application, class, and type.
3. For each bin, the records are stored by time (sTime and elapsed ).
4. Within a bin, rwcombine combines two records into a single record when the attributes field of the
first record has the T (time-out) flag set and the second record has the C (continuation) flag set. When
combining records, the bytes field and packets fields are summed, the initialFlags from the first record
is used, the sessionFlags field becomes the bit-wise OR of both sessionFlags fields and the second
record’s initialFlags field, and the eTime is set to that of the second flow.
5. If the second record’s T flag was set, rwcombine checks to see if the third record’s C flag is set. If it
is, the third record becomes part of the new record.
6. The previous step repeats for the records in the bin until the bin contains a single record, the most
recently added record did not have the T flag set, or the next record in the bin does not have the C flag
set.
7. After examining a bin, rwcombine writes the record(s) the bin contains to the destination stream.
8. Steps 3 through 7 are repeated for each bin.
The --ignore-fields switch allows the user to remove fields from the set that rwcombine uses when grouping
records in Step 2.
When combining two records into one (Step 4), rwcombine completely disregards the difference between
the first record’s end-time and the second record’s start-time (the idle time). To tell rwcombine not to
combine those records when the difference is greater than a limit, specify that value as the argument to the
--max-idle-time switch.
To see information on the number of flows combined and the minimum and maximum idle times, specify the
--print-statistics switch.
During its processing, rwcombine will try to allocate a large (near 2GB) in-memory array to hold the
records. (You may use the --buffer-size switch to change this maximum buffer size.) If more records
are read than will fit into memory, the in-core records are temporarily stored on disk as described by the
60
December 18, 2014
The SiLK Reference Guide
rwcombine(1)
--temp-directory switch. When all records have been read, the on-disk files are merged to produce the
output.
By default, the temporary files are stored in the /tmp directory. Because the sizes of the temporary files may
be large, it is strongly recommended that /tmp not be used as the temporary directory, and rwcombine will
print a warning when /tmp is used. To modify the temporary directory used by rwcombine, provide the
--temp-directory switch, set the SILK TMPDIR environment variable, or set the TMPDIR environment
variable.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--actions=ACTIONS
Select the type of action(s) that rwcombine should take to combine the input records. The default
action is all, and the following actions are supported:
all
Perform all the actions described below.
timeout
Combine into a single flow record those records where the timeout flags in the attributes field
indicate that the flow exporter has divided a long-lived session into multiple flow records.
This switch is provided for future expansion of rwcombine, since at present rwcombine supports a
single action. When writing a script that uses rwcombine, specify --action=timeout for compatibility with future versions of rwcombine.
--ignore-fields=FIELDS
Ignore the fields listed in FIELDS when determining if two flow records should be grouped into the
same bin; that is, treat FIELDS as being identical across all flows. By default, rwcombine puts
records into a bin when the records have identical values for the following fields: sIP, dIP, sPort, dPort,
protocol, sensor, in, out, nhIP, application, class, and type.
FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range
is specified by separating the start and end of the range with a hyphen (-). Field-names are caseinsensitive. Example:
--ignore-fields=sensor,12-15
The list of supported fields are:
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
December 18, 2014
61
rwcombine(1)
The SiLK Reference Guide
dPort,4
destination port for TCP and UDP, or equivalent
protocol,5
IP protocol
sensor,12
name or ID of sensor at the collection point
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
nhIP,15
router next hop IP
class,20,type,21
class and type of sensor at the collection point (represented internally by a single value)
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic
signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as
the appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
--max-idle-time=NUM
Do not combine flow records when the start time of the second flow record begins NUM seconds after
the end time of the first flow record. NUM may be fractional. If not specified, the maximum idle time
may be considered infinite.
--print-statistics
--print-statistics=FILENAME
Print to the standard error or to the specified FILENAME the number of flows records read and written,
the number of flows that did not require combining, the number of flows combined, the number that
could not be combined, and minimum and maximum idle time between combined flow records.
--temp-directory=DIR PATH
Specify the name of the directory in which to store data files temporarily when more records have
been read that will fit into RAM. This switch overrides the directory specified in the SILK TMPDIR
environment variable, which overrides the directory specified in the TMPDIR variable, which overrides
the default, /tmp.
--buffer-size=SIZE
Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means
fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this
buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by
a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega),
and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half
kilobytes. (This value does not represent the absolute maximum amount of RAM that rwcombine
will allocate, since additional buffers will be allocated for reading the input and writing the output.)
62
December 18, 2014
The SiLK Reference Guide
rwcombine(1)
--output-path=PATH
Write the SiLK Flow records to the specified file or named pipe. When the standard output is not a
terminal and this switch is not provided or its argument is - or stdout, the records are written to the
standard output.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--print-filenames
Print to the standard error the names of input files as they are opened.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwcombine searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwcombine to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwcombine will open each file in turn
and read records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
December 18, 2014
63
rwcombine(1)
The SiLK Reference Guide
--help-fields
Print the description and alias(es) of each field and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
The output from rwcut(1) shows the flow exporter split this long-lived ssh session into multiple flow records:
$ rwfilter --saddr=192.168.126.252 --dport=22 --pass=- data.rw \
| rwcut --fields=flags,attributes,stime,etime
flags|attribut|
sTime|
eTime|
S PA
|T
|2009/02/13T00:29:59.563|2009/02/13T00:59:39.668|
PA
|TC
|2009/02/13T00:59:39.668|2009/02/13T01:29:19.478|
PA
|TC
|2009/02/13T01:29:19.478|2009/02/13T01:58:48.890|
PA
|TC
|2009/02/13T01:58:48.891|2009/02/13T02:28:43.599|
F PA
| C
|2009/02/13T02:28:43.600|2009/02/13T02:32:58.272|
Here is the other half of that conversation:
$ rwfilter --daddr=192.168.126.252 --sport=22 --pass=- data.rw \
| rwcut --fields=flags,attributes,stime,etime
flags|attribut|
sTime|
eTime|
S PA
|T
|2009/02/13T00:30:00.060|2009/02/13T00:59:39.667|
PA
|TC
|2009/02/13T00:59:39.670|2009/02/13T01:29:19.478|
PA
|TC
|2009/02/13T01:29:19.481|2009/02/13T01:58:48.890|
PA
|TC
|2009/02/13T01:58:48.893|2009/02/13T02:28:43.599|
F PA
| C
|2009/02/13T02:28:43.600|2009/02/13T02:32:58.271|
Use rwuniq(1) to compute the byte and packet counts for that ssh session:
$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
| rwuniq --fields=sip,dip,sport,dport --values=records,byte,packets
sIP|
dIP|sPort|dPort|Records| Bytes|Packets|
10.11.156.107|192.168.126.252|
22|28975|
5|4677240|
3881|
192.168.126.252| 10.11.156.107|28975|
22|
5| 281939|
3891|
Invoke rwcombine on these records and store the result in the file combined.rw :
$ rwfilter --any-addr=192.168.126.252 --aport=22 --pass=- data.rw \
| rwcombine --print-statistics --output-path=combined.rw
FLOW RECORD COUNTS:
Read:
10
Initially Complete:
0 *
64
December 18, 2014
The SiLK Reference Guide
Sorted & Examined:
Missing end:
Missing start & end:
Missing start:
Prior to combining:
Eliminated:
Made complete:
Written:
IDLE TIMES:
Minimum:
Penultimate:
Maximum:
rwcombine(1)
=
=
=
10
0
0
0
10
8
2
2
*
*
*
*
(sum of *)
0:00:00:00.000
0:00:00:00.000
0:00:00:00.003
View the resulting records:
$ rwcut --fields=sip,dip,sport,dport,bytes,packets,flags combined.rw
sIP|
dIP|sPort|dPort| bytes|packets|
flags|
10.11.156.107|192.168.126.252|
22|28975|4677240|
3881|FS PA
|
192.168.126.252| 10.11.156.107|28975|
22| 281939|
3891|FS PA
|
$ rwcut --fields=sip,attributes,stime,etime combined.rw
sIP|attribut|
sTime|
eTime|
10.11.156.107|
|2009/02/13T00:30:00.060|2009/02/13T02:32:58.271|
192.168.126.252|
|2009/02/13T00:29:59.563|2009/02/13T02:32:58.272|
ENVIRONMENT
SILK TMPDIR
When set and --temp-directory is not specified, rwcombine writes the temporary files it creates to
this directory. SILK TMPDIR overrides the value of TMPDIR.
TMPDIR
When set and SILK TMPDIR is not set, rwcombine writes the temporary files it creates to this
directory.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwcombine may use this environment variable when searching for the SiLK site configuration
file.
December 18, 2014
65
rwcombine(1)
The SiLK Reference Guide
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwcombine may use this environment variable. See the FILES section for details.
SILK TEMPFILE DEBUG
When set to 1, rwcombine prints debugging messages to the standard error as it creates, re-opens,
and removes temporary files.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK TMPDIR}/
${TMPDIR}/
/tmp/
Directory in which to create temporary files.
SEE ALSO
rwfilter(1), rwcut(1), rwuniq(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)
NOTES
The first release of rwcombine occurred in SiLK 3.9.0.
66
December 18, 2014
The SiLK Reference Guide
rwcompare(1)
rwcompare
Compare the records in two SiLK Flow files
SYNOPSIS
rwcompare [--quiet] FILE1 FILE2
rwcompare --help
rwcompare --version
DESCRIPTION
rwcompare opens the two files named on the command and compares the SiLK Flow records they contain.
If the records are identical, rwcompare exits with status 0. If any of the records differ, rwcompare prints
a message and exits with status 1. If there is an issue reading either file, an error is printed and the exit
status is 2. Use the --quiet switch to suppress all output (error messages included). You may use - or stdin
for one of the file names, in which case rwcompare reads from the standard input.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--quiet
Do not print a message if the files differ, and do not an print error message if a file cannot be opened
or read.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. Some input lines are split over
multiple lines in order to improve readability, and a backslash (\) is used to indicate such lines. The examples
assume the existence of the file data.rw that contains SiLK Flow records. The exit status of the most recent
command is available in the shell variable $?.
Compare a file with itself:
$ rwcompare data.rw data.rw
$ echo $?
0
December 18, 2014
67
rwcompare(1)
The SiLK Reference Guide
Compare a file with itself, where one instance of the file is read from the standard input:
$ rwcat data.rw | rwcompare - data.rw
$ echo $?
0
Use rwsort(1) to modify one instance of the file and compare the results:
$ rwsort --fields=proto data.rw | rwcompare - data.rw
- data.rw differ: record 1
$ echo $?
1
Run the command again and use the --quiet switch:
$ rwsort --fields=proto data.rw | rwcompare --quiet - data.rw
$ echo $?
1
Compare the file with input containing two copies of the file:
$ rwcat data.rw data.rw | rwcompare data.rw data.rw - differ: EOF data.rw
$ echo $?
1
Compare the file with /dev/null :
$ rwcompare --quiet /dev/null data.rw
$ echo $?
2
rwcompare checks whether two files have the same records in the same order. To compare two arbitrary
files, use rwsort(1) to reorder the records. Make certain to provide enough fields to the rwsort command
so that the records are in the same order.
$ rwsort --fields=1-10,12-15,20-29 data.rw > /tmp/sorted-data.rw
$ rwsort --fields=1-10,12-15,20-29 other-data.rw
\
| rwcompare /tmp/sorted-data.rw /tmp/sorted-data.rw - differ: record 103363
SEE ALSO
rwfileinfo(1), rwcat(1), rwsort(1), silk(7)
68
December 18, 2014
The SiLK Reference Guide
rwcount(1)
rwcount
Print traffic summary across time
SYNOPSIS
rwcount [--bin-size=SIZE] [--load-scheme=LOADSCHEME]
[--start-time=START_TIME] [--end-time=END_TIME]
[--skip-zeroes] [--bin-slots] [--epoch-slots]
[--timestamp-format=FORMAT] [--no-titles]
[--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--site-config-file=FILENAME]
[{--legacy-timestamps | --legacy-timestamps={1,0}}]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcount --help
rwcount --version
DESCRIPTION
rwcount summarizes SiLK flow records across time. It counts the records in the input stream, and groups
their byte and packet totals into time bins. rwcount produces textual output with one row for each bin.
rwcount reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed
as it is read. When the --xargs switch is provided, rwcount will read the names of the files to process from
the named text file, or from the standard input if no file name argument is provided to the switch. The
input to --xargs must contain one file name per line.
rwcount splits each flow record into bins whose size is determined by the argument to the --bin-size switch.
When that switch is not provided, rwcount uses 30-second bins by default.
By default, the first row of data rwcount prints is the bin containing the starting time of the earliest record
that appears in the input. rwcount then prints a row for every bin until it reaches the bin containing the
most recent ending time. Rows whose counts are zero are printed unless the --skip-zero switch is specified.
The --start-time and --end-time switches tell rwcount to use a specific time for the first row and the
final row. The --start-time switch always sets the time stamp on the first bin to the specified time. With
the --end-time switch, rwcount computes a maximum end-time by setting any unspecified hour, minute,
second, and millisecond field to its maximum value, and the final bin is that which contains the maximum
end-time.
When --start-time and --end-time are both specified, rwcount reserves the memory for the bins before
it begins processing the records. If the memory cannot be allocated, rwcount exits. If this happens, try
reducing the time span or increasing the bin-size.
December 18, 2014
69
rwcount(1)
The SiLK Reference Guide
Load Scheme
A router or other flow generator summarizes the traffic it sees into records. In addition to the five-tuple
(source port and address, destination port and address, and protocol), the record has its start time, end
time, total byte count, and total packet count. There is no way to know how the bytes and packets were
distributed during the duration of the record: their distribution could be front-loaded, back-loaded, uniform,
et cetera.
When the start and end times of a individual flow record put that record into a single bin, rwcount can
simply add that record’s volume (byte and packet counts) to the bin.
When the duration of a flow record causes it to span multiple bins, rwcount must to told how to allocate
the volume among the bins. The --load-scheme switch determines this, and it has supports the following
allocation schemes:
time-proportional
Divides the total volume of the flow by the duration of the flow, and multiplies the quotient by the
time spent in the bin. Thus, the volume the flow contributes to a bin is proportional to the time the
flow spent in the bin. This models a flow where the volume/second ratio is uniform.
bin-uniform
Divides the volume of the flow by the number of bins the flow spans, and adds the quotient to each of
the bins. In this scheme, the volume/bin ratio is uniform.
start-spike
Adds the total volume for the flow into the bin containing the start time of the flow. This models a
flow that is front-loaded to the point where the entire volume is a single spike occurring in the initial
millisecond of flow.
middle-spike
Determines the time at the midpoint of the flow, and adds the entire volume for the flow into the bin
containing that time.
end-spike
Adds the total volume for the flow into the bin containing the end time of the flow. This models a flow
that is back-loaded to the point where the entire volume is a single spike occurring in final millisecond
of the flow.
maximum-volume
Adds the entire volume for the flow into every bin that contains any part of the flow. In theory, the
distribution of the bytes in the record could be a spike that occurs at any point during the flow’s
duration. This scheme allows one to determine, in aggregate, the maximum possible volume that could
have occurred during this bin. In this scheme, the Records column gives the number of records that
were active during the bin.
minimum-volume
Acts as though the volume for the flow occurred in some other bin. It is possible that a record that
spans multiple bins did not contribute any volume to the current bin. This scheme allows one to
determine, in aggregate, the minimum possible volume that may have occurred during this bin. The
Records column in this scheme, as in the maximum-volume scheme, gives the number of flow records
that were active during the bin.
70
December 18, 2014
The SiLK Reference Guide
rwcount(1)
Be aware that the ”spike” load-schemes allocate the entire flow to a single bin. This can create the impression
that there is more traffic occurring during a particular time window that the physical network supports.
The maximum-volume and minimum-volume schemes are used to compute the maximum and minimum volumes that could have been transferred during any one bin. maximum-volume intentionally over-counts the
flow volume and minimum-volume intentionally under-counts.
To see the effect of the various load-schemes, suppose rwcount is using 60-second bins and the input contains
two records. The first record begins at 12:03:50, ends at 12:06:20, and contains 12,600 bytes (60 bytes/second
for 210 seconds). This record may contribute to bins at 12:03, 12:04, 12:05, and 12:06. The second record
begins at 12:04:05 and lasts 15 seconds; this record’s volume always contributes its 200 bytes to the 12:04
bin. The --load-scheme option splits the byte-counts of the records as follows:
BIN
time-proportional
bin-uniform
start-spike
middle-spike
end-spike
maximum-volume
minimum-volume
12:03:00
12:04:00
12:05:00
12:06:00
600
3150
12600
0
0
12600
0
3800
3350
200
200
200
12800
200
3600
3150
0
12600
0
12600
0
1200
3150
0
0
12600
12600
0
For the record that spans multiple bins: the time-proportional scheme assumes 60 bytes/second, the
bin-uniform scheme divides the volume evenly by the four bins, the middle-spike scheme assumes
all the volume occurs at 12:05:05, the maximum-volume scheme adds the volume to every bin, and the
minimum-volume scheme ignores the record.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--bin-size=SIZE
Denote the size of each time bin, in seconds; defaults to 30 seconds. rwcount supports millisecond
size bins; SIZE may be a floating point value equal to or greater than than 0.001.
--load-scheme=LOADSCHEME
Specify how a flow record that spans multiple bins allocates its bytes and packets among the bins.
The default scheme is time-proportional, which assumes the volume/second ratio of the flow record
is constant. See the Load Scheme section for additional information on the load-scheme choices. The
LOADSCHEME may be one of the following names or numbers; names may be abbreviated to the
shortest prefix that is unique.
time-proportional,4
Allocate the volume in proportion to the amount of time the flow spent in the bin.
bin-uniform,0
Allocate the volume evenly across the bins that contain any part of the flow’s duration.
December 18, 2014
71
rwcount(1)
The SiLK Reference Guide
start-spike,1
Allocate the entire volume to the bin containing the start time of the flow.
middle-spike,3
Allocate the entire volume to the bin containing the time at the midpoint of the flow.
end-spike,2
Allocate the entire volume to the bin containing the end time of the flow.
maximum-volume,5
Allocate the entire volume to all of the bins containing any part of the flow.
minimum-volume,6
Allocate the flow’s volume to a bin only if the flow is completely contained within the bin; otherwise
ignore the flow.
--start-time=START TIME
Set the time of the first bin to START TIME. When this switch is not given, the first bin is one
that holds the starting time of the earliest record. The START TIME may be specified in a format
of yyyy/mm/dd[:HH[:MM[:SS[.sss]]]] (or T may be used in place of : to separate the day and
hour). The time must be specified to at least day precision, and unspecified hour, minute, second,
and millisecond values are set to zero. Whether the date strings represent times in UTC or the local
timezone depend on how SiLK was compiled, which can be determined from the Timezone support
setting in the output from rwcount --version. Alternatively, the time may be specified as seconds
since the UNIX epoch, and an unspecified milliseconds value is set to 0.
--end-time=END TIME
Set the time of the final bin to END TIME. When this switch is not given, the final bin is one that holds
the ending time of the latest record. The format of END TIME is the same as that for START TIME.
Unspecified hour, minute, second, and millisecond values are set to 23, 59, 59, and 999 respectively.
When END TIME is specified as seconds since the UNIX epoch, an unspecified milliseconds value is
set to 999. When both --start-time and --end-time are used, the END TIME is adjusted so that
the final bin represents a complete interval.
--skip-zeroes
Disable printing of bins with no traffic. By default, all bins are printed.
--bin-slots
Use the internal bin index as the label for each bin in the output; the default is to label each bin with
the time in a human-readable format.
--epoch-slots
Use the UNIX epoch time (number of seconds since midnight UTC on 1970-01-01) as the label for each
bin in the output; the default is to label each bin with the time in a human-readable format. This
switch is equivalent to --timestamp-format=epoch.
--timestamp-format=FORMAT
Specify how timestamps will be printed. When this switch is not specified, timestamps are printed in
the default format, and the timezone is UTC unless SiLK was compiled with local timezone support.
FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:
default
Print the timestamps as YYYY /MM /DD Thh :mm :ss .
72
December 18, 2014
The SiLK Reference Guide
rwcount(1)
iso
Print the timestamps as YYYY -MM -DD hh :mm :ss .
m/d/y
Print the timestamps as MM /DD /YYYY hh :mm :ss .
epoch
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK.
The timezone is one of:
utc
Use Coordinated Universal Time to print timestamps.
local
Use the TZ environment variable or the local timezone.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--print-filenames
Print to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the
standard output as long as the --output-path switch has been used to redirect rwcount’s ASCII
output.
--output-path=PATH
Determine where the output of rwcount (ASCII text) is written. If this option is not given, output
is written to the standard output.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
December 18, 2014
73
rwcount(1)
The SiLK Reference Guide
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwcount searches for the site configuration file in the locations specified in the FILES section.
--legacy-timestamps
--legacy-timestamps=NUM
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y.
Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed
in the SiLK 4.0 release.
--xargs
--xargs=FILENAME
Cause rwcount to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwcount will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
--start-epoch=START TIME
Alias the --start-time switch. This switch is deprecated as of SiLK 3.8.0.
--end-epoch=START TIME
Alias the --end-time switch. This switch is deprecated as of SiLK 3.8.0.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To count all web traffic on Feb 12, 2009, into 1 hour bins:
$ rwfilter --pass=stdout --start-date=2009/02/12:00
--end-date=2009/02/12:23 --proto=6 --aport=80
| rwcount --bin-size=3600
Date|
Records|
Bytes|
2009/02/12T00:00:00|
1490.49|
578270918.16|
2009/02/12T01:00:00|
1459.33|
596455716.52|
2009/02/12T02:00:00|
1529.06|
562602842.44|
2009/02/12T03:00:00|
1503.89|
562683116.38|
2009/02/12T04:00:00|
1561.89|
590554569.78|
....
\
\
Packets|
463951.55|
457487.80|
451456.41|
455554.81|
489273.81|
To bin the records according to their start times, use the --load-scheme switch:
74
December 18, 2014
The SiLK Reference Guide
$ rwfilter ... --pass=stdout
\
| rwcount --bin-size=3600 --load-scheme=1
Date|
Records|
Bytes|
2009/02/12T00:00:00|
1494.00|
580350969.00|
2009/02/12T01:00:00|
1462.00|
596145212.00|
2009/02/12T02:00:00|
1526.00|
561629416.00|
2009/02/12T03:00:00|
1502.00|
563500618.00|
2009/02/12T04:00:00|
1562.00|
589265818.00|
...
rwcount(1)
Packets|
464952.00|
457871.00|
451088.00|
455262.00|
489279.00|
To bin the records by their end times: $ rwfilter ... --pass=stdout \ | rwcount --bin-size=3600 -load-scheme=2 Date| Records| Bytes| Packets| 2009/02/12T00:00:00| 1488.00| 577132372.00| 463393.00|
2009/02/12T01:00:00| 1458.00| 596956697.00| 457376.00| 2009/02/12T02:00:00| 1530.00| 562806395.00|
451551.00| 2009/02/12T03:00:00| 1506.00| 562101791.00| 455671.00| 2009/02/12T04:00:00| 1562.00|
591408602.00| 489371.00| ...
To force the hourly bins to run from 30 minutes past the hour, use the --start-time switch:
$ rwfilter ... --pass=stdout
\
| rwcount --bin-size=3600 --start-time=2002/12/31:23:30
Date|
Records|
Bytes|
Packets|
2009/02/12T00:30:00|
1483.26|
581251364.04|
456554.40|
2009/02/12T01:30:00|
1494.00|
575037453.00|
449280.00|
2009/02/12T02:30:00|
1486.36|
559700466.61|
447700.15|
2009/02/12T03:30:00|
1555.23|
588882400.58|
480724.48|
2009/02/12T04:30:00|
1537.79|
564756248.52|
472003.45|
...
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwcount automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwcount does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwcount automatically invokes this program to display its
output a screen at a time.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwcount may use this environment variable when searching for the SiLK site configuration
file.
December 18, 2014
75
rwcount(1)
The SiLK Reference Guide
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwcount may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the
Timezone support value in the output from rwcount --version), the value of the TZ environment
variable determines the timezone in which rwcount displays and parses timestamps. If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes
timestamps to be displayed in and parsed as UTC. The value of the TZ environment variable is ignored
when the SiLK installation uses utc. For system information on the TZ variable, see tzset(3).
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwfilter(1), rwuniq(1), silk(7)
BUGS
Unlike rwuniq(1), rwcount does not support counting the number of distinct IPs in a bin. However, using
the --bin-time switch on rwuniq can provide time-based binning similar to what rwcount supports. Note
that rwuniq always bins by the each record’s start-time (similar to rwcount --load-factor=1), and there
is no support in rwuniq for dividing a SiLK record among multiple time bins.
76
December 18, 2014
The SiLK Reference Guide
rwcut(1)
rwcut
Print selected fields of binary SiLK Flow records
SYNOPSIS
rwcut [{--fields=FIELDS | --all-fields}]
{[--start-rec-num=START_NUM] [--end-rec-num=END_NUM]
| [--tail-recs=TAIL_START_NUM]}
[--num-recs=REC_COUNT] [--dry-run] [--icmp-type-and-code]
[--timestamp-format=FORMAT] [--epoch-time]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--integer-sensors] [--integer-tcp-flags]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--site-config-file=FILENAME]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[{--legacy-timestamps | --legacy-timestamps={1,0}}]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--pmap-column-width=NUM]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwcut [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwcut --version
DESCRIPTION
rwcut reads binary SiLK Flow records and prints the records to the screen in a textual, bar (|) delimited
format. See the EXAMPLES section below for sample output.
rwcut reads SiLK Flow records from the files named on the command line or from the standard input when
no file names are specified and --xargs is not present. To read the standard input in addition to the named
files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as it
is read. When the --xargs switch is provided, rwcut will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
December 18, 2014
77
rwcut(1)
The SiLK Reference Guide
--fields=FIELDS
FIELDS contains the list of flow attributes (a.k.a. fields or columns) to print. The columns will be
displayed in the order the fields are specified. Fields may be repeated. FIELDS is a comma separated
list of field-names, field-integers, and ranges of field-integers; a range is specified by separating the
start and end of the range with a hyphen (-). Field-names are case-insensitive. Example:
--fields=stime,10,1-5
If the --fields switch is not given, FIELDS defaults to:
sIP,dIP,sPort,dPort,protocol,packets,bytes,flags,sTime,dur,eTime,sensor
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all
fields are present in all SiLK file formats; when a field is not present, its value is 0.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sTime,9
starting time of flow in millisecond resolution
duration,10
duration of flow in millisecond resolution
eTime,11
end time of flow in millisecond resolution
sensor,12
name or ID of sensor at the collection point
class,20
class of sensor at the collection point
type,21
type of sensor at the collection point
sTime+msec,22
starting time of flow including milliseconds (milliseconds are always displayed); this field is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release
78
December 18, 2014
The SiLK Reference Guide
rwcut(1)
eTime+msec,23
end time of flow including milliseconds (milliseconds are always displayed); this field is deprecated
as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release
dur+msec,24
duration of flow including milliseconds (milliseconds are always displayed); this field is deprecated
as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release
iType
the ICMP type value for ICMP or ICMPv6 flows and empty for non-ICMP flows. This field was
introduced in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and emtpy for non-ICMP flows. See note at
iType.
icmpTypeCode,25
equivalent to iType,iCode. This field is deprecated as of SiLK 3.8.1.
Many SiLK file formats do not store the following fields and their values will always be 0; they are
listed here for completeness:
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
nhIP,15
router next hop IP
SiLK can store flows generated by enhanced collection software that provides more information than
NetFlow v5. These flows may support some or all of these additional fields; for flows without this
additional information, the field’s value is always 0.
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by the flow generator:
S
all the packets in this flow record are exactly the same size
F
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding
ACK packets)
T
flow generator prematurely created a record for a long-running connection due to a timeout.
(When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a
flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
C
flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout (or a byte threshold in the case of yaf ).
December 18, 2014
79
rwcut(1)
The SiLK Reference Guide
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the
active timeout since the flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each spanning some portion of
the total session. The first flow record will be marked with a T indicating that it hit the timeout.
The second through next-to-last records will be marked with TC indicating that this flow both
timed out and is a continuation of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures
to label the content of the flow. SiLK calls this label the application; yaf refers to it as the
appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external
files to provide the mapping from the IP or port to the label:
sType,16
for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2
if it is routable and external. Uses the mapping file specified by the SILK ADDRESS TYPES
environment variable, or the address types.pmap mapping file, as described in addrtype(3).
dType,17
as sType for the destination IP address
scc,18
for the source IP address, a two-letter country code abbreviation denoting the country where
that IP address is located. Uses the mapping file specified by the SILK COUNTRY CODES
environment variable, or the country codes.pmap mapping file, as described in ccfilter(3). The
abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.
org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental
reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other
dcc,19
as scc for the destination IP
src-MAPNAME
label determined by passing the source IP or the protocol/source-port to the user-defined mapping
defined in the prefix map associated with MAPNAME. See the description of the --pmap-file
switch below and the pmapfilter(3) manual page.
dst-MAPNAME
as src-MAPNAME for the destination IP or protocol/destination-port.
sval
dval
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME
and dst-MAPNAME , respectively. These fields are available when a prefix map is used that is
not associated with a MAPNAME.
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins
written in C (also called shared object files or dynamic libraries), as described by the --python-file
and --plugin switches.
80
December 18, 2014
The SiLK Reference Guide
rwcut(1)
--all-fields
Instruct rwcut to print all known fields. This switch may not be combined with the --fields switch.
This switch suppresses error messages from the plug-ins.
--plugin=PLUGIN
Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is
PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described
in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwcut will attempt
to find a file named PLUGIN in the directories listed in the FILES section. If rwcut finds the file, it
uses that path. If PLUGIN contains a slash or if rwcut does not find the file, rwcut relies on your
operating system’s dlopen(3) call to find the file. When the SILK PLUGIN DEBUG environment
variable is non-empty, rwcut prints status messages to the standard error as it attempts to find and
open each of its plug-ins.
--start-rec-num=START NUM
Begin printing with the START NUM ’th record by skipping the first START NUM -1 records. The
default is 1; that is, to start printing at the first record; START NUM must be a positive integer.
If START NUM is greater than the number of input records, rwcut only outputs the title. This
switch may not be combined with the --tail-recs switch. When using multiple input files, records
are treated as a single stream for the purposes of the --start-rec-num, --end-rec-num, --tail-recs,
and --num-recs switches. This switch does not affect the records written to the stream specified by
--copy-input.
--end-rec-num=END NUM
Stop printing after the END NUM ’th record. When END NUM is 0, the default, printing stops once
all input records have been printed; that is, END NUM is effectively infinity. If this value is non-zero,
it must not be less than START NUM. This switch may not be combined with the --tail-recs switch.
When using multiple input files, records are treated as a single stream for the purposes of the --startrec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the
records written to the stream specified by --copy-input.
--tail-recs=TAIL START NUM
Begin printing once rwcut is TAIL START NUM records from end of the input stream, where
TAIL START NUM is a positive integer. rwcut will print the remaining records in the input stream
unless --num-recs is also specified and is less than TAIL START NUM. The --tail-recs switch is
similar to the --start-rec-num switch except it counts from the end of the input stream. This switch
may not be combined with the --start-rec-num and --end-rec-num switches. When using multiple
input files, records are treated as a single stream for the purposes of the --start-rec-num, --end-recnum, --tail-recs, and --num-recs switches. This switch does not affect the records written to the
stream specified by --copy-input.
--num-recs=REC COUNT
Print no more than REC COUNT records. Specifying a REC COUNT of 0 will print all records,
which is the default. This switch is ignored under the following conditions: When both --start-recnum and --end-rec-num are specified; when only --end-rec-num is given and END NUM is less
than REC COUNT ; when --tail-recs is specified and TAIL START NUM is less than REC COUNT.
When using multiple input files, records are treated as a single stream for the purposes of the --startrec-num, --end-rec-num, --tail-recs, and --num-recs switches. This switch does not affect the
records written to the stream specified by --copy-input.
--dry-run
Causes rwcut to print the column headers and exit. Useful for testing.
December 18, 2014
81
rwcut(1)
The SiLK Reference Guide
--icmp-type-and-code
Unlike TCP or UDP, ICMP messages do not use ports, but instead have types and codes. Specifying
this switch will cause rwcut to print, for ICMP records, the message’s type and code in the sPort and
dPort columns, respectively. Use of this switch has been discouraged since SiLK 0.9.10. As for SiLK
3.8.1, this switch is deprecated and it will be removed in SiLK 4.0; use the iType and iCode fields
instead.
--timestamp-format=FORMAT
Specify how timestamps will be printed. When this switch is not specified, timestamps are printed in
the default format, and the timezone is UTC unless SiLK was compiled with local timezone support.
FORMAT is a comma-separated list of a format, a timezone, and/or a modifier. The format is one of:
default
Print the timestamps as YYYY /MM /DDThh:mm:ss.sss.
iso
Print the timestamps as YYYY -MM -DD hh:mm:ss.sss.
m/d/y
Print the timestamps as MM /DD/YYYY hh:mm:ss.sss.
epoch
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK.
The timezone is one of:
utc
Use Coordinated Universal Time to print timestamps.
local
Use the TZ environment variable or the local timezone.
One modifier is available:
no-msec
Truncate the milliseconds value on the timestamps and on the duration field. When milliseconds
are truncated, the sum of the printed start time and duration may not equal the printed end time.
--epoch-time
Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch
is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed
in the SiLK 4.0 release.
--ip-format=FORMAT
Specify how IP addresses will be printed. When this switch is not specified, IPs are printed in the
canonical format. The FORMAT is one of:
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
82
December 18, 2014
The SiLK Reference Guide
rwcut(1)
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively. When the --ipv6-policy is force,
the output for 127.0.0.1 becomes 0000:0000:0000:0000:0000:ffff:7f00:0001.
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release.
--integer-sensors
Print the integer ID of the sensor rather than its name.
--integer-tcp-flags
Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters
F,S,R,P,A,U,E,C are used to represent the TCP flags.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
December 18, 2014
83
rwcut(1)
The SiLK Reference Guide
--print-filenames
Print to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to
the standard output as long as the --output-path switch has been used to redirect rwcut’s ASCII
output.
--output-path=PATH
Determines where the output of rwcut (ASCII text) is written. If this option is not given, output is
written to the standard output.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only records
marked as IPv4 will be printed.
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
mix
Process the input as a mixture of IPv4 and IPv6 flow records.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Print only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwcut searches for the site configuration file in the locations specified in the FILES section.
--legacy-timestamps
--legacy-timestamps=NUM
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y,nomsec. Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be
removed in the SiLK 4.0 release.
--xargs
84
December 18, 2014
The SiLK Reference Guide
rwcut(1)
--xargs=FILENAME
Causes rwcut to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwcut will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit. Specifying switches that add new fields or additional switches
before --help will allow the output to include descriptions of those fields or switches.
--help-fields
Print the description and alias(es) of each field and exit. Specifying switches that add new fields before
--help-fields will allow the output to include descriptions of those fields.
--version
Print the version number and information about how SiLK was configured, then exit the application.
--pmap-file=MAPNAME :PATH
--pmap-file=PATH
Instruct rwcut to load the mapping file located at PATH and create the src-MAPNAME and dstMAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields
specific to that prefix map. If MAPNAME is not provided, rwcut will check the prefix map file to see
if a map-name was specified when the file was created. If no map-name is available, rwcut creates the
fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a unique value
for map-name. The --pmap-file switch(es) must precede the --fields switch. For more information,
see pmapfilter(3).
--pmap-column-width=NUM
When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.
--python-file=PATH
When the SiLK Python plug-in is used, rwcut reads the Python code from the file PATH to define
additional fields for possible output. This file should call register field() for each field it wishes to
define. For details and examples, see the silkpython(3) and pysilk(3) manual pages.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
The standard output from rwcut resembles the following (with the text wrapped for readability):
sIP|
10.30.30.31|
packets|
7|
dIP|sPort|dPort|pro|\
10.70.70.71|
80|36761| 6|\
bytes|
flags|\
3227|FS PA
|\
sTime| duration|
eTime|senso|
2003/01/01T00:00:14.625|
3.959|2003/01/01T00:00:18.584|EDGE1|
December 18, 2014
85
rwcut(1)
The SiLK Reference Guide
The first line of the output is the title line which shows the names of the selected fields; the --no-titles switch
will disable the printing of the title line. The second line and onward will contain the printed representation
of the records, with one line per record.
A common use of rwcut is to read the output of rwfilter(1). For example, to see representative TCP
traffic:
$ rwfilter --start-date=2002/01/19:00 --end-date=2002/01/19:01
--proto=6 --pass=stdout
| rwcut
\
\
To see only selected fields, use the --fields switch. For example, to print only the protocol for each record
in the input file data.rw, use:
$ rwcut --fields=proto
data.rw
The silkpython(3) manual page provides examples that use PySiLK to create and print arbitrary fields for
rwcut.
The order of the FIELDS is significant, and fields can be repeated. For example, here is a case where in
addition to the default fields of 1-12, you also to prefix each row with an integer form of the destination
IP and the start time to make processing by another tool (e.g., a spreadsheet) easier. However, within the
default fields of 1-12, you want to see dotted-decimal IP addresses. (The num2dot(1) tool converts the
numeric fields in column positions three and four to dotted quad IPs.)
$ rwfilter ... --pass=stdout \
| rwcut --fields=2,9,1-12 --ip-format=decimal --timestamp-format=epoch \
| num2dot --ip-field=3,4
Both of the following commands print the title line and the first record in the input stream:
$ rwcut --num-recs=1
data.rw
$ rwcut --end-rec-num=1
data.rw
The following prints all records except the first (plus the title):
$ rwcut --start-rec-num=2
data.rw
These three commands print only the second record:
$ rwcut --no-title --start-rec-num=2 --num-recs=1
data.rw
$ rwcut --no-title --start-rec-num=2 --end-rec-num=2
$ rwcut --no-title --end-rec-num=2 --num-recs=1
data.rw
data.rw
This command prints the title line and the final record in the input stream:
$ rwcut --tail-recs=1
data.rw
This command prints the next to last record in the input stream:
$ rwcut --no-title --tail-recs=2 --num-recs=1
86
data.rw
December 18, 2014
The SiLK Reference Guide
rwcut(1)
ENVIRONMENT
SILK IPV6 POLICY
This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
SILK PAGER
When set to a non-empty string, rwcut automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwcut does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwcut automatically invokes this program to display its output
a screen at a time.
PYTHONPATH
This environment variable is used by Python to locate modules. When --python-file is specified,
rwcut loads Python which in turn loads the PySiLK module which is comprised of several files
(silk/pysilk nl.so, silk/ init .py, etc). If this silk/ directory is located outside Python’s normal search
path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK
module.
SILK PYTHON TRACEBACK
When set, Python plug-ins will output traceback information on Python errors to the standard error.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that rwcut uses
when computing the scc and dcc fields. The value may be a complete path or a file relative to the
SILK PATH. See the FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file that rwcut uses
when computing the sType and dType fields. The value may be a complete path or a file relative to
the SILK PATH. See the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwcut may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwcut may use this environment variable. See the FILES section for details.
December 18, 2014
87
rwcut(1)
The SiLK Reference Guide
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check
the Timezone support value in the output from rwcut --version), the value of the TZ environment
variable determines the timezone in which rwcut displays timestamps. If the TZ environment variable
is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be
displayed in UTC. The value of the TZ environment variable is ignored when the SiLK installation
uses utc. For system information on the TZ variable, see tzset(3).
SILK PLUGIN DEBUG
When set to 1, rwcut prints status messages to the standard error as it attempts to find and open
each of its plug-ins. In addition, when an attempt to register a field fails, rwcut prints a message
specifying the additional function(s) that must be defined to register the field in rwcut. Be aware that
the output can be rather verbose.
FILES
$SILK ADDRESS TYPES
$SILK PATH/share/silk/address types.pmap
$SILK PATH/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file required by the sType and dType fields.
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
$SILK COUNTRY CODES
$SILK PATH/share/silk/country codes.pmap
$SILK PATH/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file required by the scc and dcc fields.
${SILK PATH}/lib64/silk/
88
December 18, 2014
The SiLK Reference Guide
rwcut(1)
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/
Directories that rwcut checks when attempting to load a plug-in.
NOTES
If you are interested in only a few fields, use the --fields option to reduce the volume of data to be produced.
For example, if you are checking to see which internal host got hit with the slammer worm (signature: UDP,
destPort 1434, pkt size 404), then the following rwfilter, rwcut combination will be much faster than simply
using default values:
$ rwfilter --proto-17 --dport=1434 --bytes-per-packet=404-404
| rwcut --fields=dip,stime
\
SEE ALSO
rwfilter(1), num2dot(1), addrtype(3), ccfilter(3), pmapfilter(3), silk-plugin(3), silkpython(3),
pysilk(3), sensor.conf(5), silk(7), yaf(1), dlopen(3)
December 18, 2014
89
rwdedupe(1)
The SiLK Reference Guide
rwdedupe
Eliminate duplicate SiLK Flow records
SYNOPSIS
rwdedupe [--ignore-fields=FIELDS] [--packets-delta=NUM]
[--bytes-delta=NUM] [--stime-delta=NUM] [--duration-delta=NUM]
[--temp-directory=DIR_PATH] [--buffer-size=SIZE]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [--print-filenames]
[--output-path=PATH] [--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwdedupe --help
rwdedupe --help-fields
rwdedupe --version
DESCRIPTION
rwdedupe reads SiLK Flow records from one or more input sources. Records that appear in the input
file(s) multiple times will only appear in the output stream once; that is, duplicate records are not written
to the output. The SiLK Flows are written to the file specified by the --output-path switch or to the
standard output when the --output-path switch is not provided and the standard output is not connected
to a terminal.
Note: As part of its processing, rwdedupe re-orders the records before writing them.
rwdedupe reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed
as it is read. When the --xargs switch is provided, rwdedupe will read the names of the files to process
from the named text file, or from the standard input if no file name argument is provided to the switch. The
input to --xargs must contain one file name per line.
By default, rwdedupe will consider one record to be a duplicate of another when all the fields in the records
match exactly. From another point on view, any difference in two records results in both records appearing
in the output. Note that all means every field that exists on a SiLK Flow record. The complete list of fields
is specified in the description of --ignore-fields in the OPTIONS section below.
To have rwdedupe ignore fields in the comparison, specify those fields in the --ignore-fields switch. When
--ignore-fields=FIELDS is specified, a record is considered a duplicate of another if all fields except those
in FIELDS match exactly. rwdedupe will treat FIELDS as being identical across all records. Put another
way, if the only difference between two records is in the FIELDS fields, only one of those records will be
written to the output.
The --packets-delta, --bytes-delta, --stime-delta and --duration-delta switches allow for ”fuzziness”
in the input. For example, if --stime-delta=NUM is specified and the only difference between two records
is in the sTime fields, and the fields are within NUM milliseconds of each other, only one record will be
written to the output.
90
December 18, 2014
The SiLK Reference Guide
rwdedupe(1)
During its processing, rwdedupe will try to allocate a large (near 2GB) in-memory array to hold the records.
(You may use the --buffer-size switch to change this maximum buffer size.) If more records are read than
will fit into memory, the in-core records are temporarily stored on disk as described by the --temp-directory
switch. When all records have been read, the on-disk files are merged to produce the output.
By default, the temporary files are stored in the /tmp directory. Because of the sizes of the temporary
files, it is strongly recommended that /tmp not be used as the temporary directory, and rwdedupe will
print a warning when /tmp is used. To modify the temporary directory used by rwdedupe, provide the
--temp-directory switch, set the SILK TMPDIR environment variable, or set the TMPDIR environment
variable.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--ignore-fields=FIELDS
Ignore the fields listed in FIELDS when determining if two flow records are identical; that is, treat
FIELDS as being identical across all flows. By default, all fields are treated as significant.
FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range
is specified by separating the start and end of the range with a hyphen (-). Field-names are caseinsensitive. Example:
--ignore-fields=stime,12-15
The list of supported fields are:
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sTime,9
starting time of flow (milliseconds resolution)
December 18, 2014
91
rwdedupe(1)
The SiLK Reference Guide
duration,10
duration of flow (milliseconds resolution)
sensor,12
name or ID of sensor at the collection point
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
nhIP,15
router next hop IP
class,20,type,21
class and type of sensor at the collection point (represented internally by a single value)
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by flow generator
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf(1), will inspect the contents of the packets that make up a flow and use traffic
signatures to label the content of the flow. SiLK calls this label the application; yaf refers to it as
the appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
--packets-delta=NUM
Treat the packets field on two records as being the same if the values differ by NUM packets or less.
If not specified, the default is 0.
--bytes-delta=NUM
Treat the bytes field on two records as being the same if the values differ by NUM bytes or less. If not
specified, the default is 0.
--stime-delta=NUM
Treat the start-time field on two records as being the same if the values differ by NUM milliseconds
or less. If not specified, the default is 0.
--duration-delta=NUM
Treat the duration field on two records as being the same if the values differ by NUM milliseconds or
less. If not specified, the default is 0.
--temp-directory=DIR PATH
Specify the name of the directory in which to store data files temporarily when more records have
been read that will fit into RAM. This switch overrides the directory specified in the SILK TMPDIR
environment variable, which overrides the directory specified in the TMPDIR variable, which overrides
the default, /tmp.
92
December 18, 2014
The SiLK Reference Guide
rwdedupe(1)
--buffer-size=SIZE
Set the maximum size of the buffer to use for holding the records, in bytes. A larger buffer means
fewer temporary files need to be created, reducing the I/O wait times. The default maximum for this
buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a real number followed by
a suffix K, M or G, which represents the numerical value multiplied by 1,024 (kilo), 1,048,576 (mega),
and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536 bytes, or one and one-half
kilobytes. (This value does not represent the absolute maximum amount of RAM that rwdedupe
will allocate, since additional buffers will be allocated for reading the input and writing the output.)
--output-path=PATH
Write the SiLK Flow records to the specified file or named pipe. When the standard output is not a
terminal and this switch is not provided or its argument is - or stdout, the records are written to the
standard output.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--print-filenames
Print to the standard error the names of input files as they are opened.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwdedupe searches for the site configuration file in the locations specified in the FILES section.
December 18, 2014
93
rwdedupe(1)
The SiLK Reference Guide
--xargs
--xargs=FILENAME
Causes rwdedupe to read file names from FILENAME or from the standard input if FILENAME is
not provided. The input should have one file name per line. rwdedupe will open each file in turn and
read records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--help-fields
Print the description and alias(es) of each field and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
LIMITATIONS
When the temporary files and the final output are stored on the same file volume, rwdedupe will require
approximately twice as much free disk space as the size of input data.
When the temporary files and the final output are on different volumes, rwdedupe will require between 1
and 1.5 times as much free space on the temporary volume as the size of the input data.
EXAMPLE
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Suppose you have made several rwfilter(1) runs to find interesting traffic:
$
$
$
$
rwfilter
rwfilter
rwfilter
rwfilter
--start-date=2008/02/04
--start-date=2008/02/04
--start-date=2008/02/04
--start-date=2008/02/04
...
...
...
...
--pass=data1.rw
--pass=data2.rw
--pass=data3.rw
--pass=data4.rw
You now want to merge that traffic into a single output file, but you want to ensure that any records
appearing in multiple output files are only counted once. You can use rwdedupe to merge the output files
to a single file, data.rw :
$ rwdedupe data1.rw data2.rw data3.rw data4.rw --output=data.rw
ENVIRONMENT
SILK TMPDIR
When set and --temp-directory is not specified, rwdedupe writes the temporary files it creates to
this directory. SILK TMPDIR overrides the value of TMPDIR.
94
December 18, 2014
The SiLK Reference Guide
rwdedupe(1)
TMPDIR
When set and SILK TMPDIR is not set, rwdedupe writes the temporary files it creates to this
directory.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwdedupe may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwdedupe may use this environment variable. See the FILES section for details.
SILK TEMPFILE DEBUG
When set to 1, rwdedupe prints debugging messages to the standard error as it creates, re-opens,
and removes temporary files.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK TMPDIR}/
${TMPDIR}/
/tmp/
Directory in which to create temporary files.
SEE ALSO
rwfilter(1), rwfileinfo(1), sensor.conf(5), silk(7), yaf(1), zlib(3)
December 18, 2014
95
rwfglob(1)
The SiLK Reference Guide
rwfglob
Print files that rwfilter’s File Selection switches will access
SYNOPSIS
rwfglob { [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
| [--flowtype=CLASS/TYPE[,CLASS/TYPE ...]] }
[--sensors=SENSOR[,SENSOR ...]]
[--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
[--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]
[--print-missing-files] [--no-block-check] [--no-file-names]
[--no-summary]
rwfglob [--data-rootdir=ROOT_DIRECTORY]
[--site-config-file=FILENAME] --help
rwfglob --version
DESCRIPTION
rwfglob accepts the normal File Selection options of rwfilter(1) and prints, to the standard output, the
names of the files that would normally be accessed, one file name per line. At the end, a summary is printed,
to the standard output, of the number of files that rwfglob found. To suppress the printing of the file names
and/or the summary, specify the --no-file-names and/or --no-summary switches, respectively.
By default, rwfglob only prints the names of files that exist. When the --print-missing-files switch is
provided, rwfglob prints, to the standard error, the names of files that it did not find, one file name per
line, preceded by the text ’Missing ’.
For each file it finds, rwfglob will check the size of the file and the number of blocks allocated to the file.
If the block count is zero but the file size is non-zero, rwfglob treats the file as existing but as residing on
tape. The names of these files are printed to the standard output, but each name is preceded by the text
’ \t*** ON TAPE ***’ where ’\t’ represents a tab character. The summary line will include the number of
files that rwfglob believes are on tape. To suppress this check and to remove the count from the summary
line, use the --no-block-check switch.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
Selection Switches
This set of switches are the same as those used by rwfilter to select the files to process. At least one of
these switches must be provided.
96
December 18, 2014
The SiLK Reference Guide
rwfglob(1)
--class=CLASS
The --class switch is used to specify a group of files to print. Only a single class may be selected
with the --class switch; for multiple classes, use the --flowtypes switch. Classes are defined in the
silk.conf(5) site configuration file. If the --class option is not given, the default-class as specified in
silk.conf is used. To see the available classes and the default class, either examine the output from
rwfglob --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.
--type={all | TYPE [,TYPE ]}
The --type predicate further specifies data within the selected CLASS by listing the TYPE s of traffic
to process. The switch takes a comma-separated list of types or the keyword all which specifies all
types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of
the flow, and they may vary by class. When the --type switch is not specified, a list of default types
is used. The default-type list is determined by the value of CLASS, and the default types generally
include only incoming traffic. To see the available types and the default types for each class, examine
the --help output of rwfglob or run rwsiteinfo with --fields=class,type,default-type.
--flowtypes=CLASS /TYPE [,CLASS /TYPE ...]
The --flowtype predicate provides an alternate way to specify class/type pairs. The --flowtypes
switch allows a single rwfglob invocation to print data from multiple classes. The keyword all may
be used for the CLASS and/or TYPE to select all classes and/or types.
--sensors=SENSOR[,SENSOR ...]
The --sensor switch is used to select data from specific sensors. The parameter is a comma separated
list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the
silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping
of sensor names to IDs and classes. When the --sensor switch is not specified, the default is to use all
sensors which are valid for the specified class(es).
--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]
The date predicates indicate which days and hours to consider when creating the list of files. The
dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the
hour is optional. A T may be used in place of the : to separate the day and hour. Whether the
YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was
compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in
the output from rwfglob --version.
When times are expressed in YYYY/MM/DD[:HH] format:
• When both --start-date and --end-date are specified to hour precision, all hours within that
time range are processed.
• When --start-date is specified to day precision, the hour specified in --end-date (if any) is
ignored, and files for all dates between midnight on start-date and 23:59 on end-date are
processed.
• When --start-date is specified to hour precision and --end-date is specified to day precision,
the hour of the start-date is used as the hour for the end-date.
• When --end-date is not specified and --start-date is specified to day precision, files for that
complete day are processed.
• When --end-date is not specified and --start-date is specified to hour precision, files for that
single hour are processed.
December 18, 2014
97
rwfglob(1)
The SiLK Reference Guide
When at least one time is expressed as seconds since the UNIX epoch:
• When --end-date is specified in epoch seconds, the given --start-date and --end-date are
considered to be in hour precision.
• When --start-date is specified in epoch seconds and --end-date is specified in YYYY/MM/DD[:HH]
format, the start-date is considered to be in day precision if it divisible by 86400, and hour precision
otherwise.
• When --start-date is specified in epoch seconds and --end-date is not given, the start-date is
considered to be in hour-precision.
When neither --start-date nor --end-date is given, rwfglob prints all files for the current day.
It is an error to specify --end-date without specifying --start-date.
--data-rootdir=ROOT DIRECTORY
Tell rwfglob to use ROOT DIRECTORY as the root of the data repository, which overrides the
location given in the SILK DATA ROOTDIR environment variable, which in turn overrides the location
that was compiled into rwfglob (/data).
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwfglob searches for the site configuration file in the locations specified in the FILES section.
--print-missing-files
This option prints to the standard error the names of the files that rwfglob expected to find but did
not. The file names are preceded by the text ’Missing ’; each file name appears on a separate line. This
switch is useful for debugging, but the list of files it produces can be misleading. For example, suppose
there is a decommissioned sensor that still appears in the silk.conf file; rwfglob considers these data
files as missing even though their absence is expected. Use the output from this switch judiciously.
Application Switches
--no-block-check
This option instructs rwfglob not to check whether the file exists on tape by checking whether the
number of blocks allocated to the file is zero. By default, rwfglob precedes a file name that has a
block count of 0 with the text ’ \t*** ON TAPE ***’.
--no-file-names
This option instructs rwfglob not to print the names of the files that it successfully finds. By default,
rwfglob prints the names of the files it finds and a summary line showing the number of files it found.
When both this switch and --print-missing-files are specified, rwfglob prints only the names of
missing files (and the summary).
--no-summary
This option instructs rwfglob not to print the summary line (that is, the line that shows the number
of files found). By default, rwfglob prints the names of the files it finds and a summary line showing
the number of files it found.
--help
Print the available options and exit. The available classes and types will be included in output; you
may specify a different root directory or site configuration file before --help to see the classes and
types available for that site.
98
December 18, 2014
The SiLK Reference Guide
rwfglob(1)
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Looking at a day on a single sensor:
$ rwfglob --start=2003/10/11 --sensor=2
/data/in/2003/10/11/in-GAMMA_20031011.23
/data/in/2003/10/11/in-GAMMA_20031011.22
/data/in/2003/10/11/in-GAMMA_20031011.21
/data/in/2003/10/11/in-GAMMA_20031011.20
/data/in/2003/10/11/in-GAMMA_20031011.19
/data/in/2003/10/11/in-GAMMA_20031011.18
/data/in/2003/10/11/in-GAMMA_20031011.17
/data/in/2003/10/11/in-GAMMA_20031011.16
/data/in/2003/10/11/in-GAMMA_20031011.15
/data/in/2003/10/11/in-GAMMA_20031011.14
/data/in/2003/10/11/in-GAMMA_20031011.13
/data/in/2003/10/11/in-GAMMA_20031011.12
/data/in/2003/10/11/in-GAMMA_20031011.11
/data/in/2003/10/11/in-GAMMA_20031011.10
/data/in/2003/10/11/in-GAMMA_20031011.09
/data/in/2003/10/11/in-GAMMA_20031011.08
/data/in/2003/10/11/in-GAMMA_20031011.07
/data/in/2003/10/11/in-GAMMA_20031011.06
/data/in/2003/10/11/in-GAMMA_20031011.05
/data/in/2003/10/11/in-GAMMA_20031011.04
/data/in/2003/10/11/in-GAMMA_20031011.03
/data/in/2003/10/11/in-GAMMA_20031011.02
/data/in/2003/10/11/in-GAMMA_20031011.01
/data/in/2003/10/11/in-GAMMA_20031011.00
globbed 24 files; 0 on tape
If you only want the summary, specify --no-file-names
$ rwfglob --start-date=2003/10/11 --sensor=2 --no-file-names
globbed 24 files; 0 on tape
ENVIRONMENT
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
December 18, 2014
99
rwfglob(1)
The SiLK Reference Guide
This environment variable specifies the root directory of data repository. This value overrides the
compiled-in value, and rwfglob uses it unless the --data-rootdir switch is specified. In addition,
rwfglob may use this value when searching for the SiLK site configuration file. See the FILES section
for details.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwfglob may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the
Timezone support value in the output from rwfglob --version), the value of the TZ environment
variable determines the timezone in which rwfglob parses timestamps. (The date on the filenames that
rwfglob returns are always in UTC.) If the TZ environment variable is not set, the default timezone
is used. Setting TZ to 0 or the empty string causes timestamps to be parsed as UTC. The value of
the TZ environment variable is ignored when the SiLK installation uses utc. For system information
on the TZ variable, see tzset(3).
FILES
${SILK CONFIG FILE}
ROOT DIRECTORY/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided, where ROOT DIRECTORY/ is the directory rwfglob is using as the root of
the data repository.
${SILK DATA ROOTDIR}/
/data/
Locations for the root directory of the data repository when the --data-rootdir switch is not specified.
SEE ALSO
rwfilter(1), rwsiteinfo(1), silk.conf(5), silk(7)
BUGS
The --print-missing-files option needs to be smarter about what files are really missing.
The output of --print-missing-files goes to the standard error, while all other output goes to the standard
output. To redirect the output of --print-missing-files to the standard output, use the following in a
Bourne-compatible shell:
100
December 18, 2014
The SiLK Reference Guide
rwfglob(1)
$ rwfglob --print-missing-files ... 2>&1
The block count check is of unknown portability across different tape-farm systems.
December 18, 2014
101
rwfileinfo(1)
The SiLK Reference Guide
rwfileinfo
Print information about a SiLK file
SYNOPSIS
rwfileinfo [--fields=FIELDS] [--summary] [--no-titles]
[--site-config-file=FILENAME]
FILE [ FILE ... ]
rwfileinfo --help
rwfileinfo --version
DESCRIPTION
rwfileinfo prints information about a SiLK file. The information that may be printed is:
1. format. The output file format, a string and its hexadecimal equivalent: FT RWSPLIT(0x12),
FT RWFILTER(0x13), etc
2. version. The version of the file, an integer. As of SiLK 1.0, the version of the file is distinct from the
version of the records in the file.
3. byte-order. The byte-order (endian-ness) of the file, a string
4. compression. The compression library used to compress the data-section of the file, a string and its
decimal equivalent (none(0), lzo1x(2). Does not include any external compression, such as if the
entire file has been compressed with gzip(1).
5. header-length. The length of the header in bytes
6. record-length. The length of a single record in bytes. This will be 1 if the records do not have a
fixed size.
7. count-records. The number of records in the file. If the record-size is 1, this value is the uncompressed
size of the data section of the file.
8. file-size. The size of the file as it is on disk
9. command-lines. The command(s) used to generate this file, for tools that support writing that
information to the header and for formats that store that information.
10. record-version. The version of the records contained in the file
11. silk-version. The release of SiLK that wrote this file, e.g., 1.0.0. This value is 0 for files written by
releases of SiLK prior to 1.0.
12. packed-file-info. The timestamp, flowtype, and sensor for a file in the SiLK data repository.
13. probe-name. The probe information for files created by flowcap(8)
14. annotations. The notes (annotations) that have been added to the file with the --note-add and
--note-file-add switches
102
December 18, 2014
The SiLK Reference Guide
rwfileinfo(1)
15. prefix-map. The mapname value for a prefix map file. The v1: that precedes the mapname denotes
the version of the prefix-map header entry, and it is printed for completeness.
16. ipset. The number and size of the nodes and leaves in an IPset file.
17. bag. The type and size of the key and counter in a Bag file.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--fields=FIELDS
Determines which information about the file is printed. FIELDS is a list of integers representing
fields to print. The FIELDS may be a comma separated list of integers; a range may be specified by
separating the start and end of the range with a hyphen (-). The available fields are listed above.
Fields are always printed in the order given above. If the --fields option is not given, all fields are
printed.
--summary
Prints a summary that lists the number of files processed, the sizes of those files, and the number of
records contained in those files.
--no-titles
Suppresses printing of the file name and field names; only the values are printed, left justified and one
per line.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwfileinfo searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLE
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Get information about the file tcp-data.rw :
$ rwfileinfo tcp-data.rw
tcp-data.rw:
format(id)
FT_RWGENERIC(0x16)
version
16
byte-order
littleEndian
compression(id)
none(0)
December 18, 2014
103
rwfileinfo(1)
The SiLK Reference Guide
header-length
record-length
record-version
silk-version
count-records
file-size
command-lines
208
52
5
1.0.1
7
572
1
rwfilter --proto=6 --pass=tcp-data.rw ...
1
This is some interesting TCP data
annotations
Return a single value which is the number of records in the file tcp-data.rw :
$ rwfileinfo --no-titles --field=count-records tcp-data.rw
7
ENVIRONMENT
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwfileinfo may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwfileinfo may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwfilter(1), flowcap(8), silk(7), gzip(1)
104
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
rwfilter
Choose which SiLK Flow records to process
SYNOPSIS
rwfilter INPUT_ARGS OUTPUT_ARGS PARTITIONING_ARGS [MISC_ARGS]
Selection switches, input switches, or input files are required:
rwfilter ...
{{ [--class=CLASS] [--type={all | TYPE[,TYPE ...]}]
| [--flowtype=CLASS/TYPE[,CLASS/TYPE ...]] }
[--sensors=SENSOR[,SENSOR ...]]
[--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
[--data-rootdir=ROOT_DIRECTORY] [--print-missing-files] }
| [--input-pipe=INPUT_PATH]
| [--xargs] | [--xargs=INPUT_PATH]
| [INPUT_PATH [INPUT_PATH...]]
One or more output switches are required:
rwfilter ...
[--all-destination=ALL_PATH [--all-destination=ALL_PATH ...]]
[--fail-destination=FAIL_PATH [--fail-destination=FAIL_PATH ...]]
[--pass-destination=PASS_PATH [--pass-destination=PASS_PATH ...]]
[{ --print-statistics[=STATS_PATH]
| --print-volume-statistics[=STATS_PATH] }]
One or more partitioning switches are required:
rwfilter ...
[--ack-flag=SCALAR] [--active-time=TIME_WINDOW]
[{--any-address=IP_WILDCARD | --not-any-address=IP_WILDCARD}]
[--any-cc=COUNTRY_CODE_LIST]
[{--any-cidr=IP_OR_CIDR_LIST | --not-any-cidr=IP_OR_CIDR_LIST}]
[--any-index=INTEGER_LIST]
[{--anyset=IP_SET_FILENAME | --not-anyset=IP_SET_FILENAME}]
[--aport=INTEGER_LIST] [--application=INTEGER_LIST]
[--attributes=ATTRIBUTES_LIST]
[--bytes=INTEGER_RANGE] [--bytes-per-packet=DECIMAL_RANGE]
[--cwr-flag=SCALAR]
[{--daddress=IP_WILDCARD | --not-daddress=IP_WILDCARD}]
[--dcc=COUNTRY_CODE_LIST]
[{--dcidr=IP_OR_CIDR_LIST | --not-dcidr=IP_OR_CIDR_LIST}]
[{--dipset=IP_SET_FILENAME | --not-dipset=IP_SET_FILENAME}]
[--dport=INTEGER_LIST] [--dtype=SCALAR]
[--duration=DECIMAL_RANGE] [--ece-flag=SCALAR]
[--etime=TIME_WINDOW] [--fin-flag=SCALAR]
December 18, 2014
105
rwfilter(1)
The SiLK Reference Guide
[--flags-all=HIGH_MASK_FLAGS_LIST]
[--flags-initial=HIGH_MASK_FLAGS_LIST]
[--flags-session=HIGH_MASK_FLAGS_LIST]
[--icmp-code=INTEGER_LIST] [--icmp-type=INTEGER_LIST]
[--input-index=INTEGER_LIST] [--ip-version=INTEGER_LIST]
[--ippair-any=FILENAME] [--ipport-any=FILENAME]
[{--next-hop-id=IP_WILDCARD | --not-next-hop-id=IP_WILDCARD}]
[{--nhcidr=IP_OR_CIDR_LIST | --not-nhcidr=IP_OR_CIDR_LIST}]
[{--nhipset=IP_SET_FILENAME | --not-nhipset=IP_SET_FILENAME}]
[--output-index=INTEGER_LIST] [--packets=INTEGER_RANGE]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]
{ [--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS]
[--pmap-any-MAPNAME=LABELS] } ]
[--protocol=INTEGER_LIST] [--psh-flag=SCALAR]
[--python-expr=PYTHON_EXPR]
[--python-file=FILENAME [--python-file=FILENAME ...]]
[--rst-flag=SCALAR]
[{--saddress=IP_WILDCARD | --not-saddress=IP_WILDCARD}]
[--scc=COUNTRY_CODE_LIST]
[{--scidr=IP_OR_CIDR_LIST | --not-scidr=IP_OR_CIDR_LIST}]
[{--sipset=IP_SET_FILENAME | --not-sipset=IP_SET_FILENAME}]
[--sport=INTEGER_LIST] [--stime=TIME_WINDOW] [--stype=SCALAR]
[--syn-flag=SCALAR] [--tcp-flags=TCP_FLAGS]
[--tuple-file=TUPLE_FILENAME { [--tuple-fields=FIELDS]
[--tuple-direction=DIRECTION]
[--tuple-delimiter=CHAR] } ]
[--urg-flag=SCALAR]
Miscellaneous switches:
rwfilter ...
[--compression-method=COMP_METHOD] [--dry-run]
[--max-fail-records=N] [--max-pass-records=N]
[--note-add=TEXT] [--note-file-add=FILE]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--print-filenames] [--site-config-file=FILENAME]
[--threads=N]
Help switches:
rwfilter [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH]
[--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]
--help
rwfilter --version
DESCRIPTION
rwfilter serves two purposes: (1) It acts as an interface to the data store to select which SiLK Flow records
to process, and (2) it partitions those records into one or more pass and/or fail streams.
106
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
The Selection Switches let one choose flow records from the SiLK data store by specifying where the flow
was collected (its sensor), the date of collection, and/or the flow’s direction. The act of selecting records
from the data store is sometimes called a ”data pull”.
The Partitioning Switches describe various types of traffic behavior (e.g., TCP traffic, or all traffic going to
port 80). When a flow record matches all of the behaviors, it can be written to a pass stream (i.e., file). If
a record fails to match any of these behavior predicates, it can be written to a fail stream. (You may also
write every record rwfilter reads to an all stream.) These output streams from rwfilter are always binary
SiLK Flow records. The output must be either written to a file or piped into another tool in the SiLK Suite,
and rwfilter complains if it determines you are attempting to send the stream to a terminal. To view the
records, pipe the records into rwcut(1).
In addition to the partitioning switches built in to rwfilter, additional partitioning predicates can be created
as C or PySiLK plug-ins, and these can be loaded into rwfilter using the --plugin and/or --python-file
switches as described below.
Instead of using the selection switches to choose flow records from the data store, rwfilter can apply the
partitioning switches to existing files of SiLK flow records---such as files generated by a previous invocation
of rwfilter. To run rwfilter in this mode, you may
• specify, on the command line, the files and/or named pipes from which rwfilter should read SiLK
Flow records. Specifying stdin or - or the command line causes rwfilter to read flow records from
the standard input.
• use the --input-pipe switch to specify a named pipe, or specify stdin or - as the argument to this
switch to have rwfilter read flow records from the standard input.
• use the --xargs switch to specify a file that contains the names of the input files to process. When
--xargs is used without an argument, rwfilter attempts to read the names of the file from the standard
input. The name of each input file must appear on a single line.
When rwfilter is reading flow records from input files, some of the selection switches act as partitioning
switches. The remaining selection switches may not be specified when using the alternate forms of input,
and it is an error to specify multiple types of input.
Unlike many other tools in the SiLK tool suite, rwfilter requires that you specify one or more Output
Switches that tell rwfilter what types of output to produce.
Finally, there are Miscellaneous Switches that control other aspects of rwfilter.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
Selection Switches
To read files from the data store, use the following options to specify which files to process. When rwfilter
gets its input from files listed on the command line or from the --xargs or --input-pipe switches, the first
four switches (--class, --type, --flowtypes, and --sensors) act as partitioning switches, and specifying any
other selection switch produces an error.
December 18, 2014
107
rwfilter(1)
The SiLK Reference Guide
--class=CLASS
The --class switch is used to specify a group of data to process. Only a single class may be selected
with the --class switch; for multiple classes, use the --flowtypes switch. Classes are defined in the
silk.conf(5) site configuration file. If the --class option is not given, the default-class as specified in
silk.conf is used. To see the available classes and the default class, either examine the output from
rwfilter --help or invoke rwsiteinfo(1) with the switch --fields=class,default-class.
--type={all | TYPE [,TYPE ]}
The --type predicate further specifies data within the selected CLASS by listing the TYPE s of traffic
to process. The switch takes a comma-separated list of types or the keyword all which specifies all
types for the specified CLASS. Types are defined in silk.conf, they typically refer to the direction of
the flow, and they may vary by class. When the --type switch is not specified, a list of default types
is used. The default-type list is determined by the value of CLASS, and the default types generally
include only incoming traffic. To see the available types and the default types for each class, examine
the --help output of rwfilter or run rwsiteinfo with --fields=class,type,default-type.
--flowtypes=CLASS /TYPE [,CLASS /TYPE ...]
The --flowtype predicate provides an alternate way to specify class/type pairs. The --flowtypes
switch allows a single rwfilter invocation to process data from multiple classes. The keyword all may
be used for the CLASS and/or TYPE to select all classes and/or types.
--sensors=SENSOR[,SENSOR ...]
The --sensor switch is used to select data from specific sensors. The parameter is a comma separated
list of sensor names, sensor IDs (integers), and/or ranges of sensor IDs. Sensors are defined in the
silk.conf(5) site configuration file, and the rwsiteinfo(1) command can be used to print a mapping
of sensor names to IDs and classes. When the --sensor switch is not specified, the default is to use all
sensors which are valid for the specified class(es).
--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]
The date predicates indicate which days and hours to consider when creating the list of files. The
dates may be expressed as seconds since the UNIX epoch or in YYYY/MM/DD[:HH] format, where the
hour is optional. A T may be used in place of the : to separate the day and hour. Whether the
YYYY/MM/DD[:HH] strings represent times in UTC or the local timezone depend on how SiLK was
compiled. To determine how your version of SiLK was compiled, see the Timezone support setting in
the output from rwfilter --version.
When times are expressed in YYYY/MM/DD[:HH] format:
• When both --start-date and --end-date are specified to hour precision, all hours within that
time range are processed.
• When --start-date is specified to day precision, the hour specified in --end-date (if any) is
ignored, and files for all dates between midnight on start-date and 23:59 on end-date are
processed.
• When --start-date is specified to hour precision and --end-date is specified to day precision,
the hour of the start-date is used as the hour for the end-date.
• When --end-date is not specified and --start-date is specified to day precision, files for that
complete day are processed.
• When --end-date is not specified and --start-date is specified to hour precision, files for that
single hour are processed.
108
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
When at least one time is expressed as seconds since the UNIX epoch:
• When --end-date is specified in epoch seconds, the given --start-date and --end-date are
considered to be in hour precision.
• When --start-date is specified in epoch seconds and --end-date is specified in YYYY/MM/DD[:HH]
format, the start-date is considered to be in day precision if it divisible by 86400, and hour precision
otherwise.
• When --start-date is specified in epoch seconds and --end-date is not given, the start-date is
considered to be in hour-precision.
When neither --start-date nor --end-date is given, rwfilter processes all files for the current day.
It is an error to specify --end-date without specifying --start-date.
It is an error to specify --start-date when rwfilter believes there is some other input specified (see
Non-Selection Input Switches).
--data-rootdir=ROOT DIRECTORY
Tell rwfilter to use ROOT DIRECTORY as the root of the data repository, which overrides the
location given in the SILK DATA ROOTDIR environment variable, which in turn overrides the location
that was compiled into rwfilter (/data). It is an error to specify this switch when files are specified
on the command line or Non-Selection Input Switches are given.
--print-missing-files
This option prints to the standard error the names of the files that rwfilter’s file selection switches
expected to find but did not. The file names are preceded by the text ’Missing ’; each file name appears
on a separate line. This switch is useful for debugging, but the list of files it produces can be misleading.
For example, suppose there is a decommissioned sensor that still appears in the silk.conf file; rwfilter
considers these data files as missing even though their absence is expected. Use the output from this
switch judiciously. It is an error to specify this switch when files are specified on the command line or
Non-Selection Input Switches are given.
Non-Selection Input Switches
Instead of using the Selection Switches to read flow records from files in the data store, you can tell rwfilter
to process files named on the command line or use one (and only one) of the following switches. To have
rwfilter read flow records from the standard input, specify stdin or - as the name of an input file or use
the (deprecated) --input-pipe switch.
--input-pipe=INPUT PATH
Specify a source for SiLK Flow records, where INPUT PATH is a named pipe or the string stdin or to represent the standard input. You do not need to use this switch, you can simply specify the named
pipe or the strings stdin or - on the command line. NOTE: This switch is deprecated, and it will be
removed in the SiLK 4.0 release.
--xargs
--xargs=INPUT PATH
Tell rwfilter to read file names from INPUT PATH ; if INPUT PATH is not provided, the names of
the files are read from the standard input. The input should have one file name per line. rwfilter
opens each file in turn and read records from it.
December 18, 2014
109
rwfilter(1)
The SiLK Reference Guide
Output Switches
At least one of the following output switches must be provided:
--all-destination=ALL PATH
Write every SiLK Flow record to ALL PATH, where ALL PATH refers to a file, a named pipe, the
string stderr to refer to the standard error, or the strings stdout or - to refer to the standard output.
This switch may be repeated to write all input records to multiple locations.
--fail-destination=FAIL PATH
Write SiLK Flow records that have failed ANY of the partitioning predicates to FAIL PATH, where
FAIL PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard
error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write
records that fail any predicate to multiple locations.
--pass-destination=PASS PATH
Write SiLK Flow records that have passed ALL of the partitioning predicates to PASS PATH, where
PASS PATH refers to a non-existent file, a named pipe, the string stderr to refer to the standard
error, or the strings stdout or - to refer to the standard output. This switch may be repeated to write
records that pass every predicate to multiple locations.
--print-statistics
--print-statistics=STATS PATH
Print a one line summary specifying the number of files processed, the total number of records read,
the number of records that passed all partitioning predicates, and the number of records that failed.
If STATS PATH is provided, the summary is printed there; otherwise it is printed to the standard
error. This switch cannot be mixed with --print-volume-statistics. When running rwfilter with
multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not
match the number of records written by rwfilter.
--print-volume-statistics
--print-volume-statistics=STATS PATH
Print a four line summary of rwfilter’s processing. For each of all records, records that pass all the
partitioning predicates, and records that fail, print the number of flow records and the number of
packets and bytes represented by those flow records. The output also includes the number of files
processed. If STATS PATH is provided, the summary is printed there; otherwise it is printed to the
standard error. This switch cannot be mixed with --print-statistics. When running rwfilter with
multiple threads and --max-pass-records or --max-fail-records is specified, the statistics may not
match the number of records written by rwfilter.
Partitioning Switches
rwfilter supports the following partitioning switches, at least one of which must be specified (unless the
only Output Switch is --all-destination). The switches are AND’ed together; i.e., to pass the filter,
the record must pass the test implied by each switch. Any record that does not pass is written to the
fail-destination(s), if specified.
Each partitioning switch defines a test. These tests can be grouped into several broad categories; within
each category, the tests are applied in the order in which the switches appear on the command line. The
categories of the partitioning tests are:
110
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
• tests for IP addresses (including the IPset checks), ports, protocol, times, TCP flags, byte and packet
counts, IP version, application, country codes
• tests based on the --tuple-file switch
• tests that use the address type or prefix map mapping files
• tests that use the IP-Association plug-in
• tests based on the --python-expr and --python-file switches
• tests defined in C-plugins and loaded via --plugin
Partitioning Switches for IP Addresses
There are three families of switches that partition based on an IP address. Each family can partition by
the source IP, the destination IP, the next hop IP, or either source or destination IP. Each family includes a
--not-* variant to reverse the sense of the test.
The --*cidr-family takes as its argument an IP OR CIDR LIST, which is a single IP address
10.1.2.3, a single CIDR block FF01::/16, or a comma separated list of IPs and/or CIDR blocks
10.0.1.0/24,10.0.2.3,10.0.4.0/24. The IP OR CIDR LIST supports IPv4 and IPv6 addresses.
The --*address-family (which includes --next-hop-id) takes as its argument an IP WILDCARD. An
IP WILDCARD is a single IP address, a single CIDR block, or a single SiLK IP Wildcard. A SiLK IP
Wildcard can represent multiple IPv4 or IPv6 addresses. An IP Wildcard contains an IP in its canonical
form, except each part of the IP (where part is an octet for IPv4 or a hexadectet for IPv6) may be a single
value, a range, a comma separated list of values and ranges, or the letter x to signify any value for that
part of the IP (that is, 0-255 for IPv4). You may not specify a CIDR suffix when using the IP Wildcard
notation. The following IP WILDCARDs all represent the same value:
::ffff:0:0/112
::ffff:0:x
::ffff:0:aaab-ffff,aaaa,0-aaa9
::ffff:0.0.0.0/112
::ffff:0.0.128-254,0-126,255,127.x
The --*set-family requires that you store the IPs in a binary IPset file and pass the name of the file to
the switch. IPset files are created from SiLK Flow records with rwset(1), or from textual input with
rwsetbuild(1). Currently, IPsets only support IPv4 addresses.
The next hop address often has a value of 0.0.0.0 since the default configuration of SiLK does not store the
next hop address in the data repository.
The address-partitioning switches are:
--scidr=IP OR CIDR LIST
Pass the record if its source IP address matches a value in IP OR CIDR LIST, a comma separated list
of IPs and/or CIDR blocks. See also --saddress and --sipset.
--dcidr=IP OR CIDR LIST
Pass the record if its destination IP address matches a value in IP OR CIDR LIST. See also --daddress
and --dipset.
December 18, 2014
111
rwfilter(1)
The SiLK Reference Guide
--any-cidr=IP OR CIDR LIST
Pass the record if either its source or its destination IP address matches a value in IP OR CIDR LIST.
This switch does not consider the next hop IP address. See also --any-address and --anyset.
--nhcidr=IP OR CIDR LIST
Pass the record if its next hop IP address matches a value in IP OR CIDR LIST. See also --nexthop-id and --nhipset.
--not-scidr=IP OR CIDR LIST
Pass the record if its source IP address does not match a value in IP OR CIDR LIST, a comma
separated list of IPs and/or CIDR blocks. See also --not-saddress and --not-sipset.
--not-dcidr=IP OR CIDR LIST
Pass the record if its destination IP address does not match a value in IP OR CIDR LIST. See also
--not-daddress and --not-dipset.
--not-any-cidr=IP OR CIDR LIST
Pass the record if neither its source nor its destination IP address matches a value in
IP OR CIDR LIST. See also --not-any-address and --not-anyset.
--not-nhcidr=IP OR CIDR LIST
Pass the record if its next hop IP address does not match a value in IP OR CIDR LIST. See also
--not-next-hop-id and --not-nhipset.
--saddress=IP WILDCARD
Pass the record if its source IP address is matched by the SiLK IP Wildcard IP WILDCARD. To
match on multiple IPs, use --scidr or create an IPset and use --sipset.
--daddress=IP WILDCARD
Pass the record if its destination IP address is matched by IP WILDCARD, a SiLK IP Wildcard. See
also --dcidr and --dipset.
--any-address=IP WILDCARD
Pass the record if either its source or its destination IP address is matched by IP WILDCARD, a
SiLK IP Wildcard. This switch does not consider the next hop IP address. See also --any-cidr and
--anyset.
--next-hop-id=IP WILDCARD
Pass the record if its next hop IP address is matched by this IP WILDCARD, a SiLK IP Wildcard.
To match on multiple IPs, use --nhcidr or create an IPset and use --nhipset.
--not-saddress=IP WILDCARD
Pass the record if its source IP address is not matched by this IP WILDCARD, a SiLK IP Wildcard.
See also --not-scidr and --not-sipset.
--not-daddress=IP WILDCARD
Pass the record if its destination IP address is not matched by this IP WILDCARD. See also --notdcidr and --not-dipset.
--not-any-address=IP WILDCARD
Pass the record if neither its source nor its destination IP address is matched by this IP WILDCARD.
Does not consider the next hop address. See also --not-any-cidr and --not-anyset.
112
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
--not-next-hop-id=IP WILDCARD
Pass the record if its next hop IP address is not matched by this IP WILDCARD. See also --notnhcidr and --not-nhipset.
--sipset=IP SET FILENAME
Pass the record if its source IP address is in the list of IPs contained in the binary set file
IP SET FILENAME. See also --scidr.
--dipset=IP SET FILENAME
As --sipset for the destination IP address. See also --dcidr.
--anyset=IP SET FILENAME
Pass the record if either its source IP address or its destination IP address is in the list of IPs contained
in the binary set file IP SET FILENAME. Does not consider the next hop IP. See also --any-cidr.
--nhipset=IP SET FILENAME
As --sipset for the next-hop IP address. See also --nhcidr.
--not-sipset=IP SET FILENAME
Pass the record if its source IP address is not in the list of IPs contained in the binary set file
IP SET FILENAME. See also --not-scidr.
--not-dipset=IP SET FILENAME
As --not-sipset for the destination IP address. See also --not-dcidr.
--not-anyset=IP SET FILENAME
Pass the record if neither its source IP address nor its destination IP address is in the list of IPs
contained in the binary set file IP SET FILENAME. Does not consider the next hop IP. See also
--not-any-cidr.
--not-nhipset=IP SET FILENAME
As --not-sipset for the next hop IP address. See also --not-nhcidr.
Partitioning Switches for Remainder of Five-Tuple
The following switches partition based on the protocol and source or destination port. The parameter to
each of these switches is an INTEGER LIST, which is a comma-separated list of individual non-negative
integer values and ranges of those values. For example, 1,2,3,5-10,99-103. A range may be specified
without an upper limit, such as 1-, in which case the upper limit is set to the maximum value.
--sport=INTEGER LIST
Pass the record if its source port is in this INTEGER LIST, possible values are 0-65535.
--dport=INTEGER LIST
Pass the record if its destination port is in this INTEGER LIST, possible values are 0-65535
--aport=INTEGER LIST
Pass the record if its source port and/or its destination port is in this INTEGER LIST, possible values
are 0-65535. For example, use --aport=25 to see all SMTP conversions regardless or where they
originated.
December 18, 2014
113
rwfilter(1)
The SiLK Reference Guide
--protocol=INTEGER LIST
Pass the record if its IP Suite Protocol is in this INTEGER LIST, possible values are 0-255.
--icmp-type=INTEGER LIST
Pass the record if its ICMP (or ICMPv6) type is in this INTEGER LIST ; possible values 0-255. This
switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a
--protocol that does not include 1 and/or 58.
--icmp-code=INTEGER LIST
Pass the record if its ICMP (or ICMPv6) code is in this INTEGER LIST ; possible values 0-255. This
switch also verifies that the flow’s protocol is 1 (or 58 if the flow is IPv6). It is an error to specify a
--protocol that does not include 1 and/or 58.
Partitioning Switches for Time
These switches partition based on whether the time stamps on the flow record occur within
the specified time window.
The form of the argument is range of two dates, startwindow and end-window, each in the form YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]], for example
2003/01/31:23:45:00.000-2003/01/31:23:59:59.999 represents the last fifteen minutes of Jan 31, 2003.
(A T may be used in place of : to separate the day and hour.) The start-window and end-window must
be set to at least day precision. For the start-window, unspecified hour, minute, second, and millisecond
values are set to 0; for the end-window, those values are set to 23, 59, 59, and 999 respectively. Thus
2003/01/31:23-2003/01/31:23 becomes 2003/01/31:23:00:00.000-2003/01/31:23:59:59.999. If an
end-window is not given, it is set to the start-window, giving a window of a single millisecond. The date
strings are considered to be in the timezone specified when SiLK was compiled, which you can determine
from the output of rwfilter --version. You may also specify the times as seconds since the UNIX epoch;
when the end-time is in epoch seconds, an unspecified milliseconds value is set to 999 and otherwise the
value is unchanged.
--active-time=TIME WINDOW
Pass the record if the record was active at ANY time during this TIME WINDOW. If a single time is
specified, pass the record if it was active at that instant.
--stime=TIME WINDOW
Pass the record if its starting time is in this TIME WINDOW.
--etime=TIME WINDOW
As --stime for the ending time.
--duration=DECIMAL RANGE
Pass the record if its duration--that is, the record’s end time minus its start time, as measured in
seconds--is in this DECIMAL RANGE. Use floating point numbers to specify millisecond values. The
range should be specified as MIN -MAX ; for example, 5.0-10.031. If a single value is given, the
duration must match that value exactly. The upper limit may be omitted; for example, a range of
1.5- passes records whose duration is at least 1.5 seconds.
Partitioning Switches for Volume
The following switches partition based on the volume of the flow; that is, the number of bytes or packets.
For additional volume-related switches, load the flowrate plug-in as described in the flowrate(3) manual
page.
114
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
These switches accept a range of non-negative integers or decimal values. If the upper limit is omitted, the
volume must be at least that size. If the argument is a single value, the volume must match that value
exactly.
--bytes=INTEGER RANGE
Pass the record if its byte count is in this INTEGER RANGE.
--packets=INTEGER RANGE
Pass the record if its packet count is in this INTEGER RANGE.
--bytes-per-packet=DECIMAL RANGE
Pass the record if its average bytes per packet count (bytes/packet) is in this DECIMAL RANGE.
Partitioning Switches for TCP Flags
When a flow generator creates a flow record from TCP packets, it creates a field that is the bitwise OR of
the TCP flags from all packets that comprise that flow record. Some flow generators, such as yaf(1), can
export two TCP flag fields: one contains the flags on the first packet in the flow, and the second contains
the bitwise OR of the remaining packets.
To partition records based on their TCP flags values, there is a recommended set of switches and legacysupported switches. The switches accept the following letters to represent the named TCP flag: F=FIN;
S=SYN; R=RST; P=PSH; A=ACK; U=URG; E=ECE; C=CWR.
The recommended set of switches take a comma separated list of pairs of TCP flags, where the pair is
separated by a slash (/). The value to the left of the slash is the HIGH SET and it must be a subset of
the value to the right of the slash, which is the MASK SET. For a record to pass the filter, the flags in the
HIGH SET must be on and the remaining flags in MASK SET must be off. Flags not in MASK SET may
have any value. If a list of pairs is given, the record passes if any pair in the list matches. For example,
--flags-all=S/S,A/A passes flows that have either the SYN or the ACK flag set, --flags-all=S/SA passes
flow records where SYN is high and ACK is low, and --flags-all=/F passes flows where FIN is off. This
list of flag pairs is called a HIGH MASK FLAGS LIST.
The recommended switches for TCP flag partitioning are:
--flags-all=HIGH MASK FLAGS LIST
Pass the record if any of the HIGH SET /MASK SET pairs is true when looking at the bitwise OR of
the TCP flags across all packets in the flow.
--flags-initial=HIGH MASK FLAGS LIST
As --flags-all, except this switch considers only the initial packet in the flow, for flow generators that
can generate that field.
--flags-session=HIGH MASK FLAGS LIST
As --flags-all, except this switch considers the bitwise OR of the TCP flags across the second through
the final packet in the flow; that is, ignoring the flags on the first packet.
The TCP-flag partitioning switches supported for legacy reasons are:
--tcp-flags=TCP FLAGS
Pass the record if, for any one of its packets, any of the specified TCP FLAGS was on, where
TCP FLAGS contains the letters F,S,R,P,A,U,E,C. For example, --tcp-flags=ASF passes records where
ACK is set, or SYN is set, or FIN is set.
December 18, 2014
115
rwfilter(1)
The SiLK Reference Guide
--ack-flag={0|1}
Set to 0, only passes records where the ACK Flag is Low, Set to 1, only passes records where the ACK
Flag is high.
--cwr-flag={0|1}
As --ack-flag for the CWR Flag
--ece-flag={0|1}
As --ack-flag for the ECE Flag
--fin-flag={0|1}
As --ack-flag for the ACK Flag
--psh-flag={0|1}
As --ack-flag for the PSH Flag
--rst-flag={0|1}
As --ack-flag for the RST Flag
--syn-flag={0|1}
As --ack-flag for the SYN Flag
--urg-flag={0|1}
As --ack-flag for the URG Flag
Partitioning Switches for Other Flow Characteristics
Other than the --ip-version switch, the fields queried by the following switches may always be zero. The
default configuration of SiLK does not store the fields that contain the SNMP values. The other fields are not
present in NetFlow v5, and require use of properly-configured enhanced collection software, such as yaf(1),
http://tools.netsa.cert.org/yaf/.
--ip-version={4|6|4,6}
Passes the record if its IP Version is in the specified list. This switch determines how IPv4 and IPv6
flow records are handled when SiLK has been compiled with IPv6 support. When the argument to
this switch is 4, rwfilter writes records marked as IPv6 to the fail-destination, regardless of the IP
addresses it contains. When the argument to this switch is 6, rwfilter writes records marked as IPv4
to the fail-destination. When SiLK has not been compiled with IPv6 support, the only legal value for
this switch is 4, and any IPv6 flows in the input ignored (that is, they are not written to either the
pass-destination nor the fail-destination).
--application=INTEGER LIST
Some flow generation software can inspect the contents of the packets that comprise a flow and use
traffic signatures to label the content of the flow. SiLK calls this label the application; yaf refers to
it as the appLabel (see the applabel(1) manual page in the yaf distribution). The application value
is the port number that is traditionally used for that type of traffic (see the /etc/services file on most
UNIX systems). For example, traffic that the flow generator recognizes as FTP has a value of 21, even
if that traffic is being routed through the standard HTTP/web port (80). The flow generator uses a
value for 0 if the application cannot be determined. The --application switch passes the flow if the
flow’s application value is in the specified INTEGER LIST, which is a comma separated list of integers
from 0 to 65535 inclusive and ranges of those integers. The list of valid appLabels is determined by
your site’s yaf installation.
116
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
--attributes=ATTRIBUTES LIST
The attributes field in SiLK Flow records describes characteristics about how the flow record was
generated or about the packets that comprise the flow record. The ATTRIBUTES LIST argument is similar to the HIGH MASK FLAGS LIST argument to the --flags-all switch. ATTRIBUTES LIST is a comma separated list of up to 8 HIGH ATTRIBUTES /MASK ATTRIBUTES
pairs, where HIGH ATTRIBUTES and MASK ATTRIBUTES are strings of the characters S,T,C,F,
and HIGH ATTRIBUTES is a subset of MASK ATTRIBUTES. rwfilter passes the record if, for any
pair of attributes in the list, the attributes listed in HIGH ATTRIBUTES are set and the remaining
attributes in MASK ATTRIBUTES are not-set. The valid attributes are:
S
All the packets in this flow record are exactly the same size.
T
The flow generator prematurely created a record for a long-lived session due to the connection’s
lifetime reaching the active timeout of the flow generator. (Also, when yaf is run with the --silk
switch, it prematurely creates a flow and marks it with T if the byte count of the flow cannot be
stored in a 32-bit value.)
C
The flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout.
F
The flow generator saw additional packets in this flow following a packet with the FIN flag set
(excluding ACK packets).
For a long-lived connection spanning several flow records, the first flow record is marked with a T
indicating that it hit the active timeout. The second through next-to-last records are marked with CT
indicating that the flow is a continuation of a connection that timed out and that this flow also timed
out. The final flow is marked with a C, indicating that it was created as a continuation of an active
flow.
--input-index=INTEGER LIST
Pass the record if its in field is in this INTEGER LIST, which is a comma separated list of integers
from 0 to 65535, inclusive, and ranges of those integers. When present, the in field normally contains
the incoming SNMP interface, but it may contain the vlanId if the packing tools were configured to
capture it (see sensor.conf(5)).
--output-index=INTEGER LIST
Pass the record if its out field is in this INTEGER LIST. When present, the out field normally contains
the outgoing SNMP interface, but it may contain the postVlanId if the packing tools were configured
to capture it.
--any-index=INTEGER LIST
Pass the record if its in field or if its out field is in this INTEGER LIST.
Selection Switches Acting as Partitioning Switches
The following four switches are normally file selection switches, that is they select which files rwfilter reads
within the data repository. However, when rwfilter gets input without querying the data repository (that
is, from files listed on the command line, from files specified by --xargs, or from the --input-pipe), these
switches become partitioning switches and determine whether a record is written to the pass-destination or
fail-destination.
December 18, 2014
117
rwfilter(1)
The SiLK Reference Guide
--class=CLASS
Pass the record if its class is CLASS and its type is listed in the --type switch, or its type is in
the default type list for CLASS when --type is not specified. Use rwfilter --help to see the list of
available classes and types, and the defaults.
--flowtypes=CLASS /TYPE [,CLASS /TYPE ...]
Pass the record its if class/type value is one of those listed. The keyword all may be used for the
CLASS and/or TYPE to select all classes and/or types. This switch cannot be used when either
--class or --type is used. Use rwfilter --help to see the list of available classes and types.
--sensors=SENSOR[,SENSOR ...]
Pass the record if its sensor is one of those listed. The parameter is a comma separated list of sensor
names, sensor IDs (integers), and/or ranges of sensor IDs. Use the rwsiteinfo(1) command to see the
list of sensors.
--type={all | TYPE [,TYPE ]}
Pass the record if its type is one of those listed and its class is specified by --class, or its class is the
default class when the --class switch is not specified. Use rwfilter --help to see the list of available
classes and types, and the defaults.
Partitioning Switches that use Additional Mapping Files
Additional partitioning switches are available that allow one to partition flow records depending on a label,
where the label is computed from an IP address or port on the record and an additional mapping file.
--pmap-file=MAPNAME :PATH
--pmap-file=PATH
Instruct rwfilter to load the mapping file located at PATH and create new switches --pmap-srcMAPNAME , --pmap-dst-MAPNAME , and --pmap-any-MAPNAME . When MAPNAME is
provided, it is used to refer to the switches specific to that prefix map. If MAPNAME is not provided,
rwfilter checks the prefix map file to see if a map-name was specified when the file was created. If
no map-name is available, rwfilter creates legacy switches as described below. Multiple --pmap-file
switches are supported as long as each uses a unique map-name. The --pmap-file switch(es) must
precede all other --pmap-* switches. For more information, see pmapfilter(3).
--pmap-src-MAPNAME =LABELS
If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source
IPv4 address that maps to a label contained in the list of labels in LABELS.
If the prefix map associated with MAPNAME is a proto-port prefix map, this matches records with a
protocol and source port combination that maps to a label contained in the list of labels in LABELS.
--pmap-dst-MAPNAME =LABELS
Similar to --pmap-src-MAPNAME , but uses the destination IP or the protocol and destination
port.
--pmap-any-MAPNAME =LABELS
If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source
IP address or a destination IP address that maps to a label contained in the list of labels in LABELS.
If the prefix map associated with MAPNAME is a port/protocol prefix map, this matches records with
a protocol and source port or destination port combination that maps to a label contained in the list
of labels in LABELS.
118
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
--pmap-saddress=LABELS
--pmap-daddress=LABELS
--pmap-any-address=LABELS
These are deprecated switches created by pmapfilter that correspond to --pmap-src-MAPNAME ,
--pmap-dst-MAPNAME , and --pmap-any-MAPNAME , respectively. These switches are available when an IP prefix map is used that is not associated with a MAPNAME.
--pmap-sport-proto=LABELS
--pmap-dport-proto=LABELS
--pmap-any-port-proto=LABELS
These are deprecated switches created by pmapfilter that correspond to --pmap-src-MAPNAME ,
--pmap-dst-MAPNAME , and --pmap-any-MAPNAME , respectively. These switches are available when a proto-port prefix map is used that is not associated with a MAPNAME.
--scc=COUNTRY CODE LIST
--dcc=COUNTRY CODE LIST
--any-cc=COUNTRY CODE LIST
Pass the record if one its IP addresses maps to a country code that is specified in COUNTRY CODE LIST. For --scc, the source IP must match. For --dcc, the destination IP must match.
For --any-cc, either the source or the destination must match. COUNTRY CODE LIST is a comma
separated list of lowercase two-letter country codes---based on the Root-Zone Whois Index (see for
example http://www.iana.org/cctld/cctld-whois.htm)---as well as the following special codes:
-N/A (e.g. private and experimental reserved addresses)
a1
anonymous proxy
a2
satellite provider
o1
other
For example: cx,uk,kr,jp,--. To use this switch, the country code mapping file must be available
in the default location, or in the location specified by the SILK COUNTRY CODES environment
variable. See ccfilter(3) for details.
--stype={0|1|2|3}
--dtype={0|1|2|3}
Pass a flow record depending on whether the IP address is internal, external, or non-routable. These
switches use the mapping file specified by the SILK ADDRESS TYPES environment variable, or the
address types.pmap mapping file, as described in addrtype(3). When the parameter is 0, pass the
record if its source (--stype) IP address or destination (--dtype) IP address is non-routable. When 1,
pass if internal. When 2, pass if external (i.e., routable but not internal). When 3, pass if not internal
(non-routable or external).
December 18, 2014
119
rwfilter(1)
The SiLK Reference Guide
Partitioning Switches across Multiple Fields
The --tuple-* family of switches allows the user to partition flow records based on multiple values of the
five-tuple.
--tuple-file=TUPLE FILENAME
This switch provides support for partitioning by arbitrary subsets of the basic five-tuple:
{source-ip,destination-ip,source-port,destination-ip-port,protocol}
A SiLK Flow record passes the test when the record’s fields match one of the tuples; if the SiLK record
does not match any tuple, the record fails. The tuples are read from the text file TUPLE FILENAME
which must contain lines of delimited fields. The default delimiter is |, but may be specified with the
--tuple-delimiter switch. Each field contains one member of the tuple; the fields may appear in any
order. The fields may represent any subset of the five-tuple, but each line in the file must define the
same subset. A field that is present but has no value generates an error. If you want the field to match
any value, it is best that you not include that field in your input.
In addition to the tuple-lines, TUPLE FILENAME may contain blank lines and comments (which
begin with # and continue to the end of the line). The first line of TUPLE FILENAME may contain
a title labeling the fields in the file. This title line is ignored when the --tuple-fields switch is given.
The IP fields may contain an IPv4 address, an integer, or a IP in CIDR block notation. Commaseparated lists (80,443) and ranges (0-1023,8080) are supported for the ports and protocol fields.
NOTE: Currently the code is not clever in its support for CIDR notation and ranges in that each
occurrence is fully expanded. When this occurs, the memory required to hold the search tree quickly
grows.
--tuple-fields=FIELDS
FIELDS contains the list of fields (columns) to parse from the TUPLE FILENAME in the order in
which they appear in the file. When this switch is not provided, rwfilter treats the first line in
TUPLE FILENAME as a title line and attempts to determine the fields (a la rwtuc(1)); rwfilter
exits if it cannot determine the fields.
FIELDS is a comma separated list of field-names, field-integers, and ranges of field-integers; a range
is specified by separating the start and end of the range with a hyphen (-). Names can be abbreviated
to their shortest unique prefix. The field names and their descriptions are:
sIP,sip,1
source IP address
dIP,dip,2
destination IP address
sPort,sport,3
source port
dPort,dport,4
destination port
protocol,5
IP protocol
--tuple-direction=DIRECTION
Allows you to change the comparison between the tuple and the SiLK Flow record. This switch allows
one to look for traffic in the reverse direction (or both directions) without having to write all of the
rules twice. The available directions are:
120
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
forward
The tuple’s fields are compared against the corresponding fields on the flow; that is, sIP is compared with sIP, dIP with dIP, sPort with sPort, dPort with dPort, and protocol with protocol.
This is the default.
reverse
The tuple’s fields are compared against the opposite fields on the flow; that is, sIP is compared
with dIP, dIP with sIP, sPort with dPort, dPort with sPort, and protocol with protocol.
both
Both of the above comparisons are performed.
--tuple-delimiter=CHAR
Specifies the character separating the input fields. When the switch is not provided, the default of | is
used.
Partitioning Switches that use the PySiLK Plug-in
The SiLK Python plug-in, silkpython.so, provides support for filtering by expressions or complex functions
written in the Python programming language. See the silkpython(3) and pysilk(3) manual pages for
information and examples for how to use Python to manipulate SiLK data structures. When multiple
Partitioning Switches are given, the Python plug-in is the next-to-last to be invoked. Only the code specified
by the --plugin switch is called after the Python code.
--python-file=FILENAME
Pass the record if the result of the processing the flow with the function named rwfilter() in
FILENAME is true. The function should take a single silk.RWRec object as an argument. See
silkpython(3) for details.
--python-expr=PYTHON EXPRESSION
Pass the record if the result of the processing the flow with the specified PYTHON EXPRESSION is
true. The expression is evaluated as if it appeared in the following context:
from silk import *
def rwfilter(rec):
return (PYTHON_EXPRESSION)
Partitioning Switches that use the IP-Association Plug-In
The IPA plug-in, ipafilter.so, provides switches that can partition flows using data in an IP Association
database. For this plug-in to be available, SiLK must be compiled with IPA support and IPA must be
configured. See ipafilter(3) and http://tools.netsa.cert.org/ipa/ for additional information.
--ipa-src-expr=IPA EXPR
Use IPA EXPR to partition flows based on the source IP of the flow matching the IPA EXPR expression.
--ipa-dst-expr=IPA EXPR
Use IPA EXPR to partition flows based on the destination IP of the flow matching the IPA EXPR
expression.
--ipa-any-expr=IPA EXPR
Use IPA EXPR to partition flows based on either the source or destination IP of the flow matching
the IPA EXPR expression.
December 18, 2014
121
rwfilter(1)
The SiLK Reference Guide
Miscellaneous Switches
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--dry-run
Perform a sanity check on the input arguments to check that the arguments are acceptable. In addition,
prints to the standard output the names of the files that would be accessed (and the names of missing
files if --print-missing is specified). rwfglob(1) can also be used to generate the lists of files that
rwfilter would access.
--help
Print the available options and exit. Options that add fields (for example, options that load plug-ins,
prefix maps, or PySiLK extensions) can be specified before the --help switch so that the new options
appear in the output. The available classes and types are included in output; you may specify a
different root directory or site configuration file before --help to see the classes and types available for
that site.
--max-fail-records=N
Write N records to each --fail-destination. rwfilter stops reading input once it has written these N
records unless --pass-destination or --all-destination switch(es) are also specified.
--max-pass-records=N
Write N records to each --pass-destination. rwfilter stops reading input once it has written these
N records unless --fail-destination or --all-destination switch(es) are also specified.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
122
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
--plugin=PLUGIN
Augment the partitioning switches by using run-time loading of the plug-in (shared object) whose path
is PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described
in the silk-plugin(3) manual page. When multiple partitioning switches are given, the code specified
by the --plugin switch(es) is last to be invoked. When PLUGIN does not contain a slash (/), rwfilter
attempts to find a file named PLUGIN in the directories listed in the FILES section. If rwfilter finds
the file, it uses that path. If PLUGIN contains a slash or if rwfilter does not find the file, rwfilter
relies on your operating system’s dlopen(3) call to find the file. When the SILK PLUGIN DEBUG
environment variable is non-empty, rwfilter prints status messages to the standard error as it attempts
to find and open each of its plug-ins.
--print-filenames
Print the names of input files as they are read. This can be useful feedback for a long-running rwfilter
process.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwfilter searches for the site configuration file in the locations specified in the FILES section.
--threads=N
Invoke rwfilter with N threads reading the input files. When this switch is not provided, the value in
the SILK RWFILTER THREADS environment variable is used. If that variable is not set, rwfilter
runs with a single thread. Using multiple threads, performance of rwfilter is greatly improved for
queries that look at many files but return few records. Preliminary testing has found that performance
peaks around four threads per CPU, but performance varies depending on the type of query and the
number of records returned.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
The most basic filtering involves looking at specific traffic over a specific time. For example:
$ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23
--proto=6 --pass-destination=tcp-in.rw
\
creates a file, tcp-in.rw containing all incoming TCP traffic on February 19, 2003. The --start-date and
--end-date switches select which files to examine. The --proto switch partitions the flow records into a
pass stream (records whose protocol is 6---that is, TCP) and a fail stream (all other records). The --passdestination switch (often shortened to --pass) tells rwfilter to write the records that pass the --proto
test to the file tcp-in.rw.
The tcp-in.rw file contains SiLK Flow data in a binary format. To examine the contents, use the command
rwcut(1). This query only selects incoming traffic because the silk.conf(5) configuration file at most sites
tells rwfilter to look at incoming traffic unless an explicit --type switch is given.
The following query gets all TCP traffic (for the default class) for February 19, 2003.
December 18, 2014
123
rwfilter(1)
$ rwfilter --type=all --start-date=2003/02/19
--proto=6 --pass-destination=alltcp.rw
The SiLK Reference Guide
\
Note the addition of --type=all. This query also relies on the default behavior of --start-date to consider
a full day’s worth of data when no hour is specified.
The above query gets all traffic for the default class. If your silk.conf file has a single class, that query
captures all of it. For silk.conf files that specify multiple classes, the following gets all TCP traffic for
February 19, 2003:
$ rwfilter --flowtypes=all/all --start-date=2003/02/19
--proto=6 --pass-destination=alltcp.rw
\
To get all non-TCP traffic, there are two approaches. rwfilter does not supply a way to choose a negated
set of protocols, but you can choose all protocols other than TCP:
$ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23
--proto=0-5,7-255 --pass-destination=non-tcp.rw
\
The other approach is to use the --fail-destination switch (often shortened to --fail) that contains the
records that failed one or more of the partitioning test(s):
$ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23
--proto=6 --fail-destination=non-tcp.rw
\
To print information about the number of flow records that pass a filter, use --print-volume-statistics.
This can be combined with other output switches.
$ rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23
--proto=6 --print-volume-stat --pass-destination=tcp-in.rw
|
Recs|
Packets|
Bytes| Files|
Total|
515359|
2722887|
1343819719|
180|
Pass|
512071|
2706571|
1342851708|
|
Fail|
3288|
16316|
968011|
|
\
If you want to see the number of records in a file produced by rwfilter, or to remind yourself how a file was
created, use rwfileinfo(1):
$ rwfileinfo tcp-in.rw
tcp-in.rw:
format(id)
FT_RWGENERIC(0x16)
version
16
byte-order
littleEndian
compression(id)
lzo1x(2)
header-length
208
record-length
52
record-version
5
silk-version
2.4.0
count-records
512071
file-size
8576160
124
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
command-lines
1 rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
--proto=6 --print-volume-stat --pass-destination=tcp-in.rw
Once a file is written, rwfilter can process the file again. Traffic on port 25 is most likely email (SMTP)
traffic. To split the email traffic from the other traffic, use:
$ rwfilter --aport=25 --pass=mail.rw --fail=not-mail.rw tcp-in.rw
This command puts traffic where the source or destination port was 25 into the file mail.rw, and all other
traffic into the file not-mail.rw. The --fail-destination is an effective way to reverse the sense of a test. For
example, to remove traffic on port 80 from the not-mail.rw file, run the command:
$ rwfilter --aport=80 --fail=not-mail-web.rw not-mail.rw
To verify that the not-mail-web.rw file does not contain any traffic on ports 25 or 80, you can use the
--print-statistics switch and see that 0 records pass:
$ rwfilter --aport=25,80 --print-stat not-mail-web.rw
Files
1. Read
54641. Pass
0. Fail
54641.
The file maintains a history of the commands that created it:
$ rwfileinfo not-mail-web.rw
not-mail-web.rw:
format(id)
FT_RWGENERIC(0x16)
version
16
byte-order
littleEndian
compression(id)
lzo1x(2)
header-length
364
record-length
52
record-version
5
silk-version
2.4.0
count-records
54641
file-size
762875
command-lines
1 rwfilter --start-date=2003/02/19:00 --end-date=2003/02/19:23 \
--proto=6 --print-volume-stat --pass-destination=tcp-in.rw
2 rwfilter --aport=25 --pass=mail.rw --fail=not-mail.rw
\
tcp-in.rw
3 rwfilter --aport=80 --fail=not-mail-web.rw not-mail.rw
The following finds all outgoing traffic from February 19, 2003, going to an external email server. Traffic
going to a server contacts that server on its well-known port, and the flow record’s destination port should
hold that well-known port:
$ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat
--dport=25 --proto=6
December 18, 2014
\
125
rwfilter(1)
The SiLK Reference Guide
To limit the result to completed connections, select flow records that contain at least three packets, use the
--packets switch with an open-ended range:
$ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat
--dport=25 --proto=6 --packets=3-
\
To limit the search to a particular internal CIDR block, 10.1.2.0/24, there are three different IP-partitioning
switches you can use. The final approach uses rwsetbuild(1) to create an IPset file from textual input.
$ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat
--dport=25 --proto=6 --packets=3- --scidr=10.1.2.0/24
\
$ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat
--dport=25 --proto=6 --packets=3- --saddress=10.1.2.x
\
$ echo "10.1.2.0/24" | rwsetbuild > my-set.set
$ rwfilter --type=out --start-date=2003/02/19 --print-volume-stat
--dport=25 --proto=6 --packets=3- --sipset=my-set.set
\
rwfilter does not have to output its records to a file; instead, the output from rwfilter can be piped into
a another SiLK tool. You must still use the --pass-destination switch (or --fail-destination or --alldestination switch), but by providing the argument of stdout or - to the switch you tell rwfilter to write
its output to the standard output.
For example, to get the IPs of the external email servers that the monitored network contacted, pipe the
rwfilter output into rwset(1), and tell rwset to store the destination addresses:
$ rwfilter --type=out --start-date=2003/02/19 --dport=25
--proto=6 --packets=3- --scidr=10.1.2.0/24 --pass=stdout
| rwset --dip-file=external-mail-servers.set
\
\
rwfilter can also pipe its output as input to another rwfilter command, which allows them to be chained
together. rwfilter does not read from the standard input by default; you must explicitly give stdin or - as
the stream to read:
$ rwfilter --type=out,outweb --start-date=2003/02/19
--scidr=10.1.2.0/24 --pass=stdout
| rwfilter --proto=17 --pass=udp.rw --fail=stdout stdin
| rwfilter --proto=6 --pass=stdout --fail=non-tcp-udp.rw stdin
| rwfilter --aport=25 --pass=mail.rw --fail=stdout stdin
| rwfilter --aport=80,443 --pass=web.rw
--fail=tcp-non-web-mail.rw stdin
\
\
\
\
\
\
This chain of commands looks at outgoing traffic on February 19, 2003, originating from the internal net-block
10.1.2.0/24, creates the following files:
udp.rw
Outgoing UDP traffic
126
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
non-tcp-udp.rw
Outgoing traffic that is neither TCP nor UDP
mail.rw
Outgoing TCP traffic on port 25, most of which is probably email (SMTP). Since the query looks at
outgoing traffic and the --aport switch was used, this file represents email going from the internal
10.1.2.0/24 to external mail servers, and the responses from any internal mail servers that exist in the
10.1.2.0/24 net-block to external clients.
web.rw
Outgoing TCP traffic on ports 80 and 443, most of which is probably web traffic (HTTP,HTTPS). As
with the mail.rw file, this file represents queries to external web servers and responses from internal
web servers.
tcp-non-web-mail.rw
Outgoing TCP traffic other than that on ports 25, 80, and 443
Expert users can create even more complicated chains of rwfilter commands using named pipes.
ENVIRONMENT
SILK RWFILTER THREADS
The number of threads to use while reading input files or files selected from the data store.
PYTHONPATH
This environment variable is used by Python to locate modules. When --python-file or --pythonexpr is specified, rwfilter loads Python which in turn loads the PySiLK module which is comprised
of several files (silk/pysilk nl.so, silk/ init .py, etc). If this silk/ directory is located outside Python’s
normal search path (for example, in the SiLK installation tree), it may be necessary to set or modify
the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can
find the PySiLK module. For information on using Python from within rwfilter, see pysilk(3).
SILK PYTHON TRACEBACK
When set, Python plug-ins output traceback information on Python errors to the standard error.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that the --scc and
--dcc switches use. The value may be a complete path or a file relative to the SILK PATH. See the
FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file that the --stype
and --dtype switches use. The value may be a complete path or a file relative to the SILK PATH. See
the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
December 18, 2014
127
rwfilter(1)
The SiLK Reference Guide
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. This value overrides the
compiled-in value, and rwfilter uses it unless the --data-rootdir switch is specified. In addition,
rwfilter may use this value when searching for the SiLK site configuration files. See the FILES section
for details.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwfilter may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the
Timezone support value in the output from rwfilter --version), the value of the TZ environment
variable determines the timezone in which rwfilter parses timestamps. If the TZ environment variable
is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be
parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses
utc. For system information on the TZ variable, see tzset(3).
SILK PLUGIN DEBUG
When set to 1, rwfilter prints status messages to the standard error as it attempts to find and open
each of its plug-ins.
SILK LOGSTATS
When set to a non-empty value, rwfilter treats the value as the path to an external program to
execute with information about this rwfilter invocation. If the value in SILK LOGSTATS does not
contain a slash or if it references a file that does not exist, is not a regular file, or is not executable,
the SILK LOGSTATS value is silently ignored. The arguments to the external program are:
• The application name, i.e., rwfilter. Note that rwfilter is always used as this argument,
regardless of the name of the executable.
• The version number of this command line, currently v0001.
• The start time of this invocation, as seconds since the UNIX epoch.
• The end time of this invocation, as seconds since the UNIX epoch.
• The number of data files opened for reading.
• The number of records read.
• The number of records written.
• A variable number of arguments that are the complete command line used to invoke rwfilter,
including the name of the executable.
SILK LOGSTATS RWFILTER
If set, this environment variable overrides the value specified in SILK LOGSTATS.
SILK LOGSTATS DEBUG
If the environment variable is set to a non-empty value, rwfilter prints messages to the standard error
about the SILK LOGSTATS value being used and either the reason why the value cannot be used or
the arguments to the external program being executed.
128
December 18, 2014
The SiLK Reference Guide
rwfilter(1)
FILES
${SILK ADDRESS TYPES}
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file required by the --stype and --dtype switches.
${SILK CONFIG FILE}
ROOT DIRECTORY/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided, where ROOT DIRECTORY/ is the directory rwfilter is using as the root of
the data repository.
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file required by the --scc and --dcc switches.
${SILK DATA ROOTDIR}/
/data/
Locations for the root directory of the data repository when the --data-rootdir switch is not specified.
${SILK PATH}/lib64/silk/
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/
Directories that rwfilter checks when attempting to load a plug-in.
December 18, 2014
129
rwfilter(1)
The SiLK Reference Guide
NOTES
rwfilter is the most commonly used application in the suite. It provides access to the data files and performs
all the basic queries.
rwfilter supports a variety of I/O options - in addition to reading from the data store, rwfilter results can
be chained together with named pipes to output results to multiple files simultaneously. An introduction to
named pipes is outside the scope of this document, however.
Two often underused options are --dry-run and --print-statistics. --dry-run performs a sanity check
on the arguments and can be used, especially for complicated arguments, to check that the arguments
are acceptable. --print-statistics used without --pass-destination or --fail-destination simply prints
aggregate statistics to the standard error on a single line, and it can be used to do a quick pass through the
data to get aggregate counts before going in deeper into the phenomenon being investigated.
--print-filename can be used as a progress meter; during long jobs, it shows which file is currently being
read by rwfilter. --print-filename does not provide meaningful feedback with piped input.
Filters are applied in the order given on the command line. It is best to apply the biggest filters first.
The rwfilter command line is written into the header of the output file(s). You may use the rwfileinfo(1)
command to see this information.
SEE ALSO
rwcut(1), rwfglob(1), rwfileinfo(1), rwset(1), rwtuc(1), rwsetbuild(1), rwsiteinfo(1), addrtype(3), ccfilter(3), flowrate(3), ipafilter(3), pmapfilter(3), pysilk(3), silkpython(3), silkplugin(3), silk.conf(5), sensor.conf(5), silk(7), rwflowpack(8), yaf(1), applabel(1), zlib(3),
dlopen(3), Analysts’ Handbook: Using SiLK for Network Traffic Analysis
130
December 18, 2014
The SiLK Reference Guide
rwgeoip2ccmap(1)
rwgeoip2ccmap
Create a country code prefix map from a GeoIP data file
SYNOPSIS
unzip -p GeoIPCountryCSV.zip | \
rwgeoip2ccmap --csv-input > country_codes.pmap
gzip -d -c GeoIPv6.csv.gz | \
rwgeoip2ccmap --v6-csv-input > country_codes.pmap
(gzip -d -c GeoIPv6.csv.gz ; unzip -p GeoIPCountryCSV.zip ) | \
rwgeoip2ccmap --v6-csv-input > country_codes.pmap
rwgeoip2ccmap --help
rwgeoip2ccmap --man
rwgeoip2ccmap --version
DESCRIPTION
Prefix maps provide a way to map field values to string labels based on a user-defined map file. The country
code prefix map, typically named country codes.pmap, is a special prefix map that maps an IP address to
a two-letter country code. It uses the country codes defined by the Internet Assigned Numbers Authority
(http://www.iana.org/root-whois/index.html).
The country code prefix map file is used by ccfilter(3) to map IP addresses to country codes in various
SiLK tools. The ccfilter feature allows you to
• partition by country codes in rwfilter(1)
• display the country codes in rwcut(1)
• sort by the country codes in rwsort(1)
• bin by the country codes in rwstats(1), rwuniq(1), and rwgroup(1).
The rwpmaplookup(1) command can use the country code mapping file to display the country code for
textual IP addresses.
The country code prefix map is based on the GeoIP Country(R) or free GeoLite database created by MaxMind(R) and available from http://www.maxmind.com/. The GeoLite database is a free evaluation copy
that is 98% accurate which is updated monthly. MaxMind sells the GeoIP Country database which has
over 99% accuracy and is updated weekly.
The database is available in multiple formats:
December 18, 2014
131
rwgeoip2ccmap(1)
The SiLK Reference Guide
GeoIPCountryCSV.zip
a compressed (zip(1)) textual file containing an IPv4 range, country name, and county code in a
comma separated value (CSV) format. If you download this format, specify --csv-input on the
rwgeoip2ccmap command line. This is the recommended format for IPv4 support.
GeoIP.dat.gz
a compressed (gzip(1)) binary file containing an encoded form of the IPv4 address range and country
code. If you download this format, specify --encoded-input on the rwgeoip2ccmap command line.
This format is not recommended, as rwgeoip2ccmap may not know about all the country codes that
the binary file contains.
GeoIPv6.csv.gz
a compressed (gzip) textual file containing an IPv6 range, country name, and county code in a CSV
format. If you download this format, specify --v6-csv-input on the rwgeoip2ccmap command line.
This file only contains IPv6 data. If you use this file to create your country code prefix map, any IPv4
addresses will have the unknown value --.
GeoIPv6.dat.gz
a compressed (gzip) binary file containing an encoded form of the IPv6 address range and country
code. rwgeoip2ccmap does not support this input file.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
One of the following switches is required:
--csv-input
Treat the standard input as a textual stream containing the CSV (comma separated value) GeoIP
country code data for IPv4.
--encoded-input
Treat the standard input as a binary stream containing the encoded GeoIP country code data for IPv4.
--v6-csv-input
Treat the standard input as a textual stream containing the CSV GeoIP country code data for IPv6.
The following switches display information about rwgeoip2ccmap:
--help
Print the available options and exit.
--version
Print the version number and exit the application.
--man
Print the formatted manual page to the $PAGER or to the standard output, and exit.
132
December 18, 2014
The SiLK Reference Guide
rwgeoip2ccmap(1)
EXAMPLES
The following examples show how to create the country code prefix map file, country codes.pmap, from
various forms of input. Once you have created the country codes.pmap file, you should copy it to
/usr/local/share/silk/country codes.pmap so that the ccfilter(3) plug-in can find it. Alternatively, you
can set the SILK COUNTRY CODES environment variable to the location of the country codes.pmap file.
In these examples, the dollar sign ($) represents the shell prompt. Some input lines are split over multiple
lines in order to improve readability, and a backslash (\) is used to indicate such lines.
IPv4 Comma Separated Values File
Download CSV version of the MaxMind GeoIP Country database for IPv4, GeoIPCountryCSV.zip. To
expand this file, use the unzip(1) utility; by using the -p option to unzip, you can pass the output of
unzip directly to rwgeoip2ccmap:
$ unzip -p GeoIPCountryCSV.zip | \
rwgeoip2ccmap --csv-input > country_codes.pmap
IPv4 Binary Encoded File
Obtain the binary version of the MaxMind GeoIP Country database for IPv4, GeoIP.dat.gz. Use the -d
switch of the gzip(1) tool to uncompress the file, and the -c switch causes gzip to write the result to the
standard output. To create the country codes.pmap data file, run:
$ gzip -d -c GeoIP.dat.gz | \
rwgeoip2ccmap --encoded-input > country_codes.pmap
IPv6 Comma Separated Values File
If you download the IPv6 version of the MaxMind GeoIP Country database, use the following command to
create the country codes.pmap file:
$ gzip -d -c GeoIPv6.csv.gz | \
rwgeoip2ccmap --v6-csv-input > country_codes.pmap
Since the GeoIPv6.csv.gz file only contains IPv6 addresses, the resulting country codes.pmap file will display
the unknown value (--) for any IPv4 address. See the next example for a solution.
IPv6 and IPv4 Comma Separated Values Files
To create a country codes.pmap mapping file that supports both IPv4 and IPv6 addresses, first download
both of the CSV files (GeoIPv6.csv.gz and GeoIPCountryCSV.zip) from MaxMind.
You need to uncompress both files and feed the result as a single stream to the standard input of rwgeoip2ccmap. This can be done in a few commands:
December 18, 2014
133
rwgeoip2ccmap(1)
The SiLK Reference Guide
$ gzip -d GeoIPv6.csv.gz
$ unzip GeoIPCountryCSV.zip
$ cat GeoIPv6.csv GeoIPCountryWhois.csv | \
rwgeoip2ccmap --v6-csv-input > country_codes.pmap
Alternatively, if your shell supports it, you may be able to use a subshell to avoid having to store the
uncompressed data:
$ ( gzip -d -c GeoIPv6.csv.gz ; unzip -p GeoIPCountryCSV.zip ) | \
rwgeoip2ccmap --v6-csv-input > country_codes.pmap
SEE ALSO
ccfilter(3), rwpmaplookup(1), rwfilter(1), rwcut(1), rwsort(1), rwstats(1), rwuniq(1), rwgroup(1), rwpmapbuild(1), silk(7), gzip(1), zip(1), unzip(1)
134
December 18, 2014
The SiLK Reference Guide
rwgroup(1)
rwgroup
Tag similar SiLK records with a common next hop IP value
SYNOPSIS
rwgroup
{--id-fields=KEY | --delta-field=FIELD --delta-value=DELTA}
[--objective] [--summarize] [--rec-threshold=THRESHOLD]
[--group-offset=IP]
[--note-add=TEXT] [--note-file-add=FILE] [--output-path=PATH]
[--copy-input=PATH] [--compression-method=COMP_METHOD]
[--site-config-file=FILENAME]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[FILE]
rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwgroup [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwgroup --version
DESCRIPTION
rwgroup reads sorted SiLK Flow records (c.f. rwsort(1)) from the standard input or from a single file
name listed on the command line, marks records that form a group with an identifier in the Next Hop IP
field, and prints the binary SiLK Flow records to the standard output. In some ways rwgroup is similar to
rwuniq(1), but rwgroup writes SiLK flow records instead of textual output.
Two SiLK records are defined as being in the same group when the fields specified in the --id-fields switch
match exactly and when the field listed in the --delta-field matches within the value given by the --deltavalue switch. Either --id-fields or --delta-fields is required; both may be specified. A --delta-value must
be given when --delta-fields is present.
The first group of records gets the identifer 0, and rwgroup writes that value into each record’s Next Hop
IP field. The ID for each subsequent group is incremented by 1. The --group-offset switch may be used to
set the identifier of the initial group.
The --rec-threshold switch may be used to only write groups that contain a certain number of records.
The --summarize switch attempts to merge records in the same group to a single output record.
rwgroup requires that the records are sorted on the fields listed in the --id-fields and --delta-fields
switches. For example, a call using
rwgroup --id-field=2 --delta-field=9 --delta-value=3
should read the output of
December 18, 2014
135
rwgroup(1)
The SiLK Reference Guide
rwsort --field=2,9
otherwise the results are unpredictable.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
At least one value for --id-field or --delta-field must be provided; rwgroup terminates with an error if no
fields are specified.
--id-fields=KEY
KEY contains the list of flow attributes (a.k.a. fields or columns) that must match exactly for flows
to be considered part of the same group. Each field may be specified once only. KEY is a comma
separated list of field-names, field-integers, and ranges of field-integers; a range is specified by separating
the start and end of the range with a hyphen (-). Field-names are case insensitive. Example:
--id-fields=stime,10,1-5
There is no default value for the --id-fields switch.
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all
fields are present in all SiLK file formats; when a field is not present, its value is 0.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sTime,9
starting time of flow (seconds resolution)
duration,10
duration of flow (seconds resolution)
136
December 18, 2014
The SiLK Reference Guide
rwgroup(1)
eTime,11
end time of flow (seconds resolution)
sensor,12
name or ID of sensor at the collection point
class,20
class of sensor at the collection point
type,21
type of sensor at the collection point
iType
the ICMP type value for ICMP or ICMPv6 flows and zero for non-ICMP flows. Internally, SiLK
stores the ICMP type and code in the dPort field, so there is no need have both dPort and iType
or iCode in the sort key. This field was introduced in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and zero for non-ICMP flows. See note at iType.
icmpTypeCode,25
equivalent to iType,iCode in --id-fields. This field may not be mixed with iType or iCode, and
this field is deprecated as of SiLK 3.8.1. As of SiLK 3.8.1, icmpTypeCode may no longer be used
as the argument to --delta-field; the dPort field will provide an equivalent result as long as the
input is limited to ICMP flow records.
Many SiLK file formats do not store the following fields and their values will always be 0; they are
listed here for completeness:
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
SiLK can store flows generated by enhanced collection software that provides more information than
NetFlow v5. These flows may support some or all of these additional fields; for flows without this
additional information, the field’s value is always 0.
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by the flow generator:
S
all the packets in this flow record are exactly the same size
F
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding
ACK packets)
T
flow generator prematurely created a record for a long-running connection due to a timeout.
(When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a
flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
December 18, 2014
137
rwgroup(1)
The SiLK Reference Guide
C
flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout (or a byte threshold in the case of yaf ).
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the
active timeout since the flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each spanning some portion of
the total session. The first flow record will be marked with a T indicating that it hit the timeout.
The second through next-to-last records will be marked with TC indicating that this flow both
timed out and is a continuation of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures
to label the content of the flow. SiLK calls this label the application; yaf refers to it as the
appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external
files to provide the mapping from the IP or port to the label:
sType,16
categorize the source IP address as non-routable, internal, or external and group based on the
category. Uses the mapping file specified by the SILK ADDRESS TYPES environment variable,
or the address types.pmap mapping file, as described in addrtype(3).
dType,17
as sType for the destination IP address
scc,18
the country code of the source IP address.
Uses the mapping file specified by the
SILK COUNTRY CODES environment variable, or the country codes.pmap mapping file, as described in ccfilter(3).
dcc,19
as scc for the destination IP
src-MAPNAME
value determined by passing the source IP or the protocol/source-port to the user-defined mapping
defined in the prefix map associated with MAPNAME. See the description of the --pmap-file
switch below and the pmapfilter(3) manual page.
dst-MAPNAME
as src-MAPNAME for the destination IP or protocol/destination-port.
sval
dval
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME
and dst-MAPNAME , respectively. These fields are available when a prefix map is used that is
not associated with a MAPNAME.
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins
written in C (also called shared object files or dynamic libraries), as described by the --python-file
and --plugin switches.
138
December 18, 2014
The SiLK Reference Guide
rwgroup(1)
--delta-field=FIELD
Specify a single field that can differ by a specified delta-value among the SiLK records that make up
a group. The FIELD identifiers include most of those specified for --id-fields. The exceptions are
that plug-in fields are not supported, nor are fields that do not have numeric values (e.g., class, type,
flags). The most common value for this switch is stime, which allows records that are identical in
the id-fields but temporally far apart to be in different groups. The switch takes a single argument;
multiple delta fields cannot be specified. When this switch is specified, the --delta-value switch is
required.
--delta-value=DELTA VALUE
Specify the acceptable difference between the values of the --delta-field. The --delta-value switch
is required when the --delta-field switch is provided. For fields other than those holding IPs, when
two consecutive records have values less than or equal to DELTA VALUE, the records are considered
members of the same group. When the delta-field refers to an IP field, DELTA VALUE is the number
of least significant bits of the IPs to remove before comparing them. For example, when --deltafield=sIP --delta-value=8 is specified, two records are the same group if their source IPv4 addresses
belong to the same /24 or if their source IPv6 addresses belong to the same /120. The --objective
switch affects the meaning of this switch.
--objective
Change the behavior of the --delta-value switch so that a record is considered part of a group if the
value of its --delta-field is within the DELTA VALUE of the first record in the group. (When this
switch is not specified, consecutive records are compared.)
--summarize
Cause rwgroup to print (typically) a single record for each group. By default, all records in each
group having at least --rec-threshold members is printed. When --summarize is active, the record
that is written for the group is the first record in the group with the following modifications:
• The packets and bytes values are the sum of the packets and bytes values, respectively, for all
records in the group.
• The start-time value is the earliest start time for the records in the group.
• The end-time value is the latest end time for the records in the group.
• The flags and session-flags values are the bitwise-OR of all flags and session-flags values, respectively, for the records in the group.
Note that multiple records for a group may be printed if the bytes, packets, or elapsed time values are
too large to be stored in a SiLK flow record.
--plugin=PLUGIN
Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is
PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described
in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwgroup will
attempt to find a file named PLUGIN in the directories listed in the FILES section. If rwgroup finds
the file, it uses that path. If PLUGIN contains a slash or if rwgroup does not find the file, rwgroup
relies on your operating system’s dlopen(3) call to find the file. When the SILK PLUGIN DEBUG
environment variable is non-empty, rwgroup prints status messages to the standard error as it attempts
to find and open each of its plug-ins.
--rec-threshold=THRESHOLD
Specify the minimum number of SiLK records a group must contain before the records in the group
are written to the output stream. The default is 1; i.e., write all records. The maximum threshold is
65535.
December 18, 2014
139
rwgroup(1)
The SiLK Reference Guide
--group-offset=IP
Specify the value to write into the Next Hop IP for the records that comprise the first group. The
value IP may be an integer, or an IPv4 or IPv6 address in the canonical presentation form. If not
specified, counting begins at 0. The value for each subsequent group is incremented by 1.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the
standard output as long as the --output-path switch has been used to redirect rwgroup’s output.
--output-path=PATH
Determines where the output of rwgroup is written. If this option is not given, output is written to
the standard output.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwgroup searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit. Specifying switches that add new fields or additional switches
before --help will allow the output to include descriptions of those fields or switches.
140
December 18, 2014
The SiLK Reference Guide
rwgroup(1)
--help-fields
Print the description and alias(es) of each field and exit. Specifying switches that add new fields before
--help-fields will allow the output to include descriptions of those fields.
--version
Print the version number and information about how SiLK was configured, then exit the application.
--pmap-file=MAPNAME :PATH
--pmap-file=PATH
Instruct rwgroup to load the mapping file located at PATH and create the src-MAPNAME and
dst-MAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields
specific to that prefix map. If MAPNAME is not provided, rwgroup will check the prefix map file
to see if a map-name was specified when the file was created. If no map-name is available, rwgroup
creates the fields sval and dval. Multiple --pmap-file switches are supported as long as each uses
a unique value for map-name. The --pmap-file switch(es) must precede the --id-fields switch. For
more information, see pmapfilter(3).
--python-file=PATH
When the SiLK Python plug-in is used, rwgroup reads the Python code from the file PATH to define
additional fields that can be used as part of the group key. This file should call register field() for
each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual
pages.
LIMITATIONS
rwgroup requires sorted data. The application works by comparing records in the order that the records
are received (similar to the UNIX uniq(1) command), odd orders will produce odd groupings.
EXAMPLES
In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
As a rule of thumb, the --id-fields and --delta-field parameters should match rwsort(1)’s call, with -delta-field being the last parameter. A call to group all web traffic by queries from the same addresses
(field=2) within 10 seconds (field=9) of the first query from that address will be:
$ rwfilter --proto=6 --dport=80 --pass=stdout
| rwsort --field=2,9
| rwgroup --id-field=2 --delta-field=9 --delta-value=10
--objective
\
\
\
ENVIRONMENT
PYTHONPATH
This environment variable is used by Python to locate modules. When --python-file is specified,
rwgroup loads Python which in turn loads the PySiLK module which is comprised of several files
December 18, 2014
141
rwgroup(1)
The SiLK Reference Guide
(silk/pysilk nl.so, silk/ init .py, etc). If this silk/ directory is located outside Python’s normal search
path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK
module.
SILK PYTHON TRACEBACK
When set, Python plug-ins will output traceback information on Python errors to the standard error.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that rwgroup uses
when computing the scc and dcc fields. The value may be a complete path or a file relative to the
SILK PATH. See the FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file that rwgroup uses
when computing the sType and dType fields. The value may be a complete path or a file relative to
the SILK PATH. See the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwgroup may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwgroup may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, rwgroup prints status messages to the standard error as it attempts to find and open
each of its plug-ins. In addition, when an attempt to register a field fails, rwgroup prints a message
specifying the additional function(s) that must be defined to register the field in rwgroup. Be aware
that the output can be rather verbose.
FILES
${SILK ADDRESS TYPES}
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file required by the sType and dType fields.
142
December 18, 2014
The SiLK Reference Guide
rwgroup(1)
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file required by the scc and dcc fields.
${SILK PATH}/lib64/silk/
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/
Directories that rwgroup checks when attempting to load a plug-in.
SEE ALSO
rwfilter(1), rwfileinfo(1), rwsort(1), rwuniq(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3),
silkpython(3), silk-plugin(3), sensor.conf(5), uniq(1), silk(7), yaf(1), dlopen(3), zlib(3)
December 18, 2014
143
rwidsquery(1)
The SiLK Reference Guide
rwidsquery
Invoke rwfilter to find flows matching Snort signatures
SYNOPSIS
rwidsquery --intype=INPUT_TYPE
[--output-file=OUTPUT_FILE]
[--start-date=YYYY/MM/DD[:HH] [--end-date=YYYY/MM/DD[:HH]]]
[--year=YEAR] [--tolerance=SECONDS]
[--config-file=CONFIG_FILE]
[--mask=PREDICATE_LIST]
[--verbose] [--dry-run]
[INPUT_FILE | -]
[-- EXTRA_RWFILTER_ARGS...]
rwidsquery --help
rwidsquery --version
DESCRIPTION
rwidsquery facilitates selection of SiLK flow records that correspond to Snort IDS alerts and signatures.
rwidsquery takes as input either a snort(8) alert log or rule file, analyzes the alert or rule contents, and
invokes rwfilter(1) with the appropriate arguments to retrieve flow records that match attributes of the
input file. rwidsquery will process the Snort rules or alerts from a single file named on the command
line; if no file name is given, rwidsquery will attempt to read the Snort rules or alerts from the standard
input, unless the standard input is connected to a terminal. An input file name of - or stdin will force
rwidsquery to read from the standard input, even when the standard input is a terminal.
OPTIONS
In addition to the options listed below, you can pass extra options through to rwfilter(1) on the rwidsquery
command line. The syntax for doing so is to place a double-hyphen (--) sequence after all valid rwidsquery
options, and before all of the options you wish to pass through to rwfilter.
--intype=INPUT TYPE
Specify the type of input contained in the input file. This switch is required. Two alert formats and
one rule format are currently supported. Valid values for this option are:
fast
Input is a Snort ”fast” log file entry. Alerts are written in this format when Snort is configured
with the snort fast output module enabled. snort fast alerts resemble the following:
Jan
1 01:23:45 hostname snort[1976]: [1:1416:11] ...
full
Input is a Snort ”full” log file entry. Alerts are written in this format when Snort is configured
with the snort full output module enabled. snort full alerts look like the following example:
144
December 18, 2014
The SiLK Reference Guide
[**] [116:151:1] (snort decoder) Bad Traffic
rwidsquery(1)
...
rule
Input is a Snort rule (signature). For example:
alert tcp $EXTERNAL_NET any -> $HOME_NET any ...
--output-file=OUTPUT FILE
Specify the output file that flows will be written to. If not specified, the default is to write to stdout.
The argument to this option becomes the argument to rwfilter’s --pass-destination switch.
--start-date=YYYY/MM/DD[:HH]
--end-date=YYYY/MM/DD[:HH]
Used in conjunction with rule file input only. The date predicates indicate which time to start and end
the search. See the rwfilter(1) manual page for details of the date format.
--year=YEAR
Used in conjunction with alert file input only. Timestamps in Snort alert files do not contain year
information. By default, the current calendar year is used, but this option can be used to override this
default behavior.
--tolerance=SECONDS
Used in conjunction with alert file input only. This option is provided to compensate for timing
differences between the timestamps in Snort alerts and the start/end time of the corresponding flows.
The default --tolerance value is 3600 seconds, which means that flow records +/- one hour from the
alert timestamp will be searched.
--config-file=CONFIG FILE
Used in conjunction with rule file input only. Snort requires a configuration file which, among other
things, contains variables that can be used in Snort rule definitions. This option allows you to specify
the location of this configuration file so that IP addresses, port numbers, and other information from
the snort configuration file can be used to find matching flows.
--mask=PREDICATE LIST
Exclude the rwfilter predicates named in PREDICATE LIST from the selection criteria. This option
is provided to widen the scope of queries by making them more general than the Snort rule or alert
provided. For instance, --mask=dport will return flows with any destination port, not just those which
match the input Snort alert or rule.
--verbose
Print the resulting rwfilter(1) command to the standard error prior to executing it.
--dry-run
Print the resulting rwfilter(1) command to the standard error but do not execute it.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
December 18, 2014
145
rwidsquery(1)
The SiLK Reference Guide
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To find SiLK flows matching a Snort alert in snort fast format:
$ rwidsquery --intype fast --year 2007 --tolerance 300 alert.fast.txt
For the following Snort alert:
Nov 15 00:00:58 hostname snort[5214]: [1:1416:11]
SNMP broadcast trap [Classification: Attempted Information Leak]
[Priority: 2]: {TCP}
192.168.0.1:4161 -> 127.0.0.1:139
The resulting rwfilter(1) command would look similar to:
$ rwfilter --start-date=2007/11/14:23 --end-date=2007/11/15:00
--stime=2007/11/14:23:55:58-2007/11/15:00:05:58
--saddress=192.168.0.1 --sport=4161 --daddress=127.0.0.1
--dport=139 --protocol=6 --pass=stdout
\
\
\
If you want to find flows matching the same criteria, except you want UDP flows instead of TCP flows, use
the following syntax:
$ rwidsquery --intype fast --year 2007 --tolerance 300
--mask protocol alert.fast.txt -- --protocol=17
\
which would yield the following rwfilter command line:
$ rwfilter --start-date=2007/11/14:23 --end-date=2007/11/15:00
--stime=2007/11/14:23:55:58-2007/11/15:00:05:58
--saddress=192.168.0.1 --sport=4161 --daddress=127.0.0.1
--dport=139 --protocol=17 --pass=stdout
\
\
\
To find SiLK flows matching a Snort rule:
$ rwidsquery --intype rule --start 2008/02/20:00 --end 2008/02/20:02 \
--config /opt/local/etc/snort/snort.conf --verbose rule.txt
For the following Snort rule:
alert icmp $EXTERNAL_NET any -> $HOME_NET any
(msg:"ICMP Parameter Problem Bad Length"; icode:2; itype:12;
classtype:misc-activity; sid:425; rev:6;)
The resulting rwfilter(1) command would look similar to:
146
December 18, 2014
The SiLK Reference Guide
$ rwfilter --start-date=2008/02/20:00 --end-date=2008/02/20:02
--stime=2008/02/20:00-2008/02/20:02
--sipset=/tmp/tmpeKIPn2.set --icmp-code=2 --icmp-type=12
--pass=stdout
rwidsquery(1)
\
\
\
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the location for the site configuration file, silk.conf. When this
environment variable is not set, rwfilter searches for the site configuration file in the locations specified
in the FILES section.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository for rwfilter. This value
overrides the compiled-in value. In addition, rwfilter may use this value when searching for the SiLK
site configuration files. See the FILES section for details.
SILK RWFILTER THREADS
The number of threads rwfilter uses when reading files from the data store.
SILK PATH
This environment variable gives the root of the install tree. When searching for the site configuration
file, rwfilter may use this environment variable. See the FILES section for details.
RWFILTER
Complete path to the rwfilter program. If not set, rwscanquery attempts to find rwfilter on your
PATH.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file---for report types that use rwfilter.
SEE ALSO
rwfilter(1), silk.conf(5), silk(7), snort(8)
December 18, 2014
147
rwip2cc(1)
The SiLK Reference Guide
rwip2cc
Maps IP addresses to country codes
SYNOPSIS
rwip2cc { --address=IP_ADDRESS | --input-file=FILE }
[--map-file=PMAP_FILE] [--print-ips={0,1}]
[{--integer-ips | --zero-pad-ips}] [--no-columns]
[--column-separator=CHAR] [--no-final-delimiter]
[{--delimited | --delimited=CHAR}]
[--output-path=PATH] [--pager=PAGER_PROG]
rwip2cc --help
rwip2cc --version
DESCRIPTION
As of SiLK 3.0, rwip2cc is deprecated, and it will be removed in the SiLK 4.0 release. Use rwpmaplookup(1) instead---the EXAMPLES section shows how to use rwpmaplookup to get output similar
to that produced by rwip2cc.
rwip2cc maps from (textual) IP address to two letter country code. Either the --address or --input-file
switch is required.
The --address switch looks up the country code of a single IP address and prints the country code to the
standard output.
The --input-file switch reads data from the specified file (use stdin or - to read from the standard input)
and prints, to the standard output, the country code for each IP it sees. Blank lines in the input are ignored;
comments, which begin at the # character and extend to the end of line, are also ignored. Each line that is
not a blank or a comment should contain an IP address or a CIDR block; rwip2cc will complain if the line
cannot be parsed. Note that for CIDR blocks, the CIDR block is exploded into its constituent IP addresses
and the country code for each IP address is printed.
The --print-ips switch controls whether the IP is printed with its country code. When --print-ips=1 is
specified, the output contains two columns: the IP and the country-code. When --print-ips=0 is specified,
only the country code is given. The default behavior is to print the IP whenever the --input-file switch is
provided, and not print the IP when --address is given.
You can tell rwip2cc to use a specific country code prefix map file by giving the location of that file to the -map-file switch. The country code prefix map file is created with the rwgeoip2ccmap(1) command. When
--map-file is not specified, rwip2cc attempts to use the default country code mapping file, as specified in
the FILES section below.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
148
December 18, 2014
The SiLK Reference Guide
rwip2cc(1)
--address=IP ADDRESS
Print to the standard output the country code for the single IP ADDRESS.
--input-file=FILE
Print the IP and country code for each IP address in FILE ; use stdin to read from the standard input.
--map-file=PMAP FILE
Use the designated country code prefix mapping file instead of the default.
--print-ips={0|1}
Controls whether the IP is printed. When the value is 1, the output contains two columns: the IP
and the country-code. When the value is 0, only the country code is given. When this switch is
not specified, the default behavior is to print the IPs only when input comes from a file (i.e., when
--input-file is specified).
--integer-ips
Enable printing of IPs and print the IPs as integers. By default, IP addresses are printed in their
canonical form.
--zero-pad-ips
Enable printing of IPs and print the IP addresses in their canonical form, but add zeros to the IP address
so it fully fills the width of column. For IPv4, use three digits per octet, e.g, 127.000.000.001.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--output-path=PATH
Determines where the output of rwuniq (ASCII text) is written. If this option is not given, output is
written to the standard output.
--pager=PAGER PROG
When the --input-file switch is specified and output is to a terminal, invoke the program
PAGER PROG to view the output one screen full at a time. This switch overrides the SILK PAGER
environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the
terminal.
--help
Print the available options and exit.
December 18, 2014
149
rwip2cc(1)
The SiLK Reference Guide
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
The following examples demonstrate the use of rwip2cc. In addition, each example shows how to get similar
output using rwpmaplookup(1).
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Single address specified on the command line
Print the country code for a single address using the default country code map. By default, only the value
is printed when the address is specified on the command line.
$ rwip2cc --address=10.0.0.0
-Use the --print-ips switch to print the address and the country.
$ rwip2cc --print-ip=1 --address=10.0.0.0
10.0.0.0|--|
rwpmaplookup expects the input to come from a file, so use the --no-files switch to tell rwpmaplookup
that the command line arguments are the addresses to print. By default, rwpmaplookup prints a title line,
and each row contains the key and the value.
$ rwpmaplookup --country-code --no-files 10.0.0.0
key|value|
10.0.0.0|
--|
Use rwpmaplookup’s command line switches to exactly mimic the default output from rwip2cc:
$ rwpmaplookup --country-code --fields=value --delimited --no-title \
--no-files 10.0.0.0
-Single address using a different country code file
Print the country code for a single address specified on the command line using an older version of the
country code mapping file.
$ rwip2cc --map-file=old-addresses.pmap --address=128.2.0.0
us
$ rwpmaplookup --country-code=old-address-map.pmap --no-files 128.2.0.0
key|value|
128.2.0.0|
us|
150
December 18, 2014
The SiLK Reference Guide
rwip2cc(1)
Addresses read from the standard input
Using the default country code map, print the country code for multiple addresses read from the standard
input. When the --input-file switch is given, the default output includes the address.
$ echo ’10.0.0.0/31’ | rwip2cc --input-file=stdin
10.0.0.0|--|
10.0.0.1|--|
You can use the --print-ips switch to suppress the IPs.
$ echo ’10.0.0.0/31’ | rwip2cc --print-ips=0 --input-file=stdin
--Unlike rwip2cc, rwpmaplookup does not accept CIDR blocks as input. Use the IPset tools rwsetbuild(1)
to parse the CIDR block list and rwsetcat(1) to print the list.
$ echo ’10.0.0.0/31’ | rwsetbuild | rwsetcat --cidr=0 \
| rwpmaplookup --country-code
key|value|
10.0.0.0|
--|
10.0.0.1|
--|
Addresses read from a file
Using an older version of the country code map, print the country code for multiple addresses read from a
file.
$ export SILK_COUNTRY_CODES=old-addresses.pmap
$ cat file.txt
128.2.1.1
128.2.2.2
$ rwip2cc --input-file=file.txt
128.2.1.1|us|
128.2.2.2|us|
$ rwpmaplookup --no-title --country-code file.txt
128.2.1.1|
us|
128.2.2.2|
us|
ENVIRONMENT
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that rwip2cc will
use. The value may be a complete path or a file relative to SILK PATH. If the variable is not specified,
the code looks for a file named country codes.pmap as specified in the FILES section below.
December 18, 2014
151
rwip2cc(1)
The SiLK Reference Guide
SILK PATH
This environment variable gives the root of the install tree. As part of its search for the Country Code
mapping file, rwip2cc checks the directories $SILK PATH/share/silk and $SILK PATH/share for a
file named country codes.pmap.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK PAGER
When set to a non-empty string, rwip2cc automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwip2cc does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwip2cc automatically invokes this program to display its
output a screen at a time.
FILES
rwip2cc will look for the prefix map file that maps IPs to country codes in the following locations.
($SILK COUNTRY CODES is the value of the SILK COUNTRY CODES environment variable, if it is
set. $SILK PATH is value of the SILK PATH environment variable, if it is set. The use of /usr/local/
assumes the application is installed in the /usr/local/bin/ directory.)
$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap
SEE ALSO
rwpmaplookup(1), rwgeoip2ccmap(1), rwsetbuild(1), rwsetcat(1), silk(7)
152
December 18, 2014
The SiLK Reference Guide
rwipaexport(1)
rwipaexport
Export IPA datasets to SiLK binary data files
SYNOPSIS
rwipaexport --catalog=CATALOG [--time=TIME] [--prefix-map-name=NAME]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] OUTPUT_FILE
rwipaexport --help
rwipaexport --version
DESCRIPTION
rwipaexport exports data from an IPA (IP Association, http://tools.netsa.cert.org/ipa/) data store to a
SiLK IPset, Bag, or prefix map file, depending on the type of the stored IPA catalog. For catalogs with time
information (e.g. time period at which the stored data is considered valid) data can be selected for a specific
time of interest.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--catalog=CATALOG NAME
Specifies the name of the IPA catalog to export from.
--time=TIME
This argument allows you to export a dataset that was active at TIME. The expected format of this
option is YYYY/MM/DD[:HH[:MM[:SS]]]. A dataset will only be returned if TIME falls between the
start and end time for the dataset. If this option is not specified, the current time will be used. See the
TIME RANGES section of ipaimport(1) for more information about how time ranges are used in
IPA.
--prefix-map-name=NAME
When creating a prefix map file, add NAME to the header of the file as the map-name. When this
switch is not specified, no map-name is written to the file. If the output is not a prefix map file, the
--prefix-map-file switch is ignored.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
December 18, 2014
153
rwipaexport(1)
The SiLK Reference Guide
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To export the badhosts IPset from an IPA set catalog into the file badhosts.set where there is no time
information:
$ rwipaexport --catalog=badhosts badhosts.set
To export the flowcount Bag from an IPA bag catalog into the file flowcount-20070415.bag where there is
time information:
$ rwipaexport --catalog=flowcount --time=2007/04/15
flowcount-20070415.bag
154
\
December 18, 2014
The SiLK Reference Guide
rwipaexport(1)
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
FILES
$SILK PATH/share/silk/silk-ipa.conf
$SILK PATH/share/silk-ipa.conf
/usr/local/share/silk/silk-ipa.conf
/usr/local/share/silk-ipa.conf
Possible locations for the IPA configuration file. This file contains the URI for connecting to the IPA
database. If the configuration file does not exist, rwipaexport will exit with an error. The format of
this URI is driver ://user :pass-word @hostname/database. For example:
postgresql://ipauser:[email protected]/ipa
SEE ALSO
rwipaimport(1), rwfileinfo(1), ipafilter(3), silk(7), ipaimport(1), ipaexport(1), ipaquery(1),
zlib(3)
December 18, 2014
155
rwipaimport(1)
The SiLK Reference Guide
rwipaimport
Import SiLK IP collections into an IPA catalog
SYNOPSIS
rwipaimport --catalog=CATALOG [--description=DESCRIPTION]
[--start-time=START_TIME] [--end-time=END_TIME] INPUT_FILE
rwipaimport --help
rwipaimport --version
DESCRIPTION
rwipaimport reads a SiLK IPset, Bag, or Prefix Map file and imports its contents into an IPA (IP Association, http://tools.netsa.cert.org/ipa/) catalog. An IPA catalog is a collection of sets, bags, and prefix
maps which can have an optional time period associated with them defining when that particular collection
of data is considered valid.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--catalog=CATALOG NAME
Specifies the name of the IPA catalog to import into. If the catalog does not already exist in the IPA
data store, it will be created. This option is required.
--description=DESCRIPTION
An optional text description of the catalog’s contents. This description will be stored in the database
and will be visible when querying available catalogs with the ipaquery tool. The description will only
be added to new catalogs; if you import a dataset into an existing catalog, this option is ignored.
--start-time=START TIME
Specifies the beginning of the time range for which the imported data is valid. The expected format
of this option is either a timestamp in YYYY/MM/DD[:HH[:MM[:SS]]] format, or ... (three dots)
to indicate the time range is left-unbounded. For more information about this argument, refer to the
TIME RANGES section of ipaimport(1).
--end-time=END TIME
Specifies the end of the time range for which the imported data is valid. The expected format of
this option is either a timestamp in YYYY/MM/DD[:HH[:MM[:SS]]] format, or ... (three dots) to
indicate the time range is right-unbounded. For more information about this argument, refer to the
TIME RANGES section of ipaimport(1).
--help
Print the available options and exit.
156
December 18, 2014
The SiLK Reference Guide
rwipaimport(1)
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To import the IPset file test-april.set into a new catalog with the name testset and a short description,
with data valid for only the month of April, 2007:
$ rwipaimport --catalog=testset --desc="Test set catalog"
--start=2007/04/01 --end=2007/05/01
test-april.set
\
\
To import the Bag file test.bag into a new catalog named testbag with data valid for all dates and times
(the ... literally means the characters ...):
$ rwipaimport --catalog=testbag --start=... --end=... test.bag
FILES
$SILK PATH/share/silk/silk-ipa.conf
$SILK PATH/share/silk-ipa.conf
/usr/local/share/silk/silk-ipa.conf
/usr/local/share/silk-ipa.conf
Possible locations for the IPA configuration file. This file contains the URI for connecting to the IPA
database. If the configuration file does not exist, rwipaimport will exit with an error. The format of
this URI is driver ://user :pass-word @hostname/database. For example:
postgresql://ipauser:[email protected]/ipa
SEE ALSO
rwipaexport(1), ipafilter(3), silk(7), ipaimport(1), ipaexport(1), ipaquery(1)
December 18, 2014
157
rwipfix2silk(1)
The SiLK Reference Guide
rwipfix2silk
Convert IPFIX records to SiLK Flow records
SYNOPSIS
rwipfix2silk [--silk-output=FILE] [--print-statistics]
[--interface-values={snmp | vlan}]
[--log-destination={stdout | stderr | none | PATH}]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
{[--xargs] | [--xargs=FILENAME] | [IPFIXFILE [IPFIXFILE...]]}
rwipfix2silk --help
rwipfix2silk --version
DESCRIPTION
rwipfix2silk reads IPFIX (Internet Protocol Flow Information eXport) records from files or from the standard input, converts the records to the SiLK Flow format, and writes the SiLK records to the path specified
by --silk-output or to the standard output when stdout is not the terminal and --silk-output is not
provided.
rwipfix2silk reads IPFIX records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. When the --xargs switch is provided, rwipfix2silk will read the
names of the files to process from the named text file, or from the standard input if no file name argument
is provided to the switch. The input to --xargs must contain one file name per line.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--silk-output=FILE
Write the SiLK records to FILE, which must not exist. If the switch is not provided or if FILE has
the value stdout or -, the SiLK flows are written to the standard output.
--print-statistics
Print, to the standard error, the number of records that were written to the SiLK output file. See also
--log-destination.
--interface-values={snmp | vlan}
Specify which IPFIX fields should be stored in the input and output fields of the generated SiLK
Flow records. If this switch is not specified, the default is snmp. The choices are:
158
December 18, 2014
The SiLK Reference Guide
rwipfix2silk(1)
snmp
Store the indexes of the network interface cards where the flows entered and left the router. That
is, store the ingressInterface in input and the egressInterface in output.
vlan
Store the VLAN identifiers for the source and destination networks. That is, store vlanId in
input and postVlanId in output. If only one VLAN ID is available, input is set to that value
and output is set to 0.
--log-destination={none | stdout | stderr | PATH }
Write more detailed information to the specified destination. The default destination is none which
suppresses messages. Use stdout or stderr to send messages to the standard output or standard
error, respectively. Any other value is treated as a file name in which to write the messages. When an
existing file is specified, rwipfix2silk appends any messages to the file. Information that is written
includes the following:
• For each input stream, the number of forward and reverse IPFIX records read and number of
records ignored.
• Messages about invalid records.
• When the SILK IPFIX PRINT TEMPLATES environment variable is set to 1, the IPFIX templates that were read.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
December 18, 2014
159
rwipfix2silk(1)
The SiLK Reference Guide
--xargs
--xargs=FILENAME
Causes rwipfix2silk to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwipfix2silk will open each file in turn
and read records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To convert a packet capture (pcap(3)) file, packets.pcap, such as that produced by tcpdump(1), to the
SiLK format, use the yaf(1) tool (see http://tools.netsa.cert.org/yaf/) to convert the capture data to IPFIX
and rwipfix2silk to convert the IPFIX data to the SiLK format, storing the records in silk.rw :
$ yaf --silk --in packets.pcap --out | rwipfix2silk --silk-output=silk.rw
\
Note that you can produce the same result using the rwp2yaf2silk(1) wrapper script:
$ rwp2yaf2silk --in packets.pcap --out silk.rw
You can use rwsilk2ipfix(1) to convert the SiLK file back to an IPFIX format, storing the result in ipfix.dat:
$ rwsilk2ipfix --silk-output=silk.rw ipfix.dat
If you want to create flow records that contain a single packet (similar to the output of rwptoflow(1)),
specify --idle-timeout=0 on the yaf command line:
$ yaf --silk --in packets.pcap --out - --idle-timeout=0
| rwipfix2silk --silk-output=silk.rw
\
To have yaf decode VLAN identifiers for 802.1Q packets and to have rwipfix2silk store the VLAN IDs in
the input and output fields of the SiLK Flow records, use:
$ yaf --silk --in packets.pcap --out | rwipfix2silk --silk-output=silk.rw --interface-values=vlan
\
Note: yaf releases prior to 1.3 would only export the VLAN identifiers when the --mac switch was provided
on the command line.
160
December 18, 2014
The SiLK Reference Guide
rwipfix2silk(1)
ENVIRONMENT
SILK IPFIX PRINT TEMPLATES
When set to 1, rwipfix2silk writes messages to the log file describing each IPFIX template it reads.
(Use --log-destination to change the destination from its default of none.) The first message includes
the domain, the template identifier, the number of information elements in the template, and the name
of this environment variable. Next, a message is printed for each information element in the template
where the message contains the domain id, the template id, and the element’s position in the template,
length in octets, numeric information element identifier, and name. For elements defined by a private
enterprise, the IE number has two parts: the private enterprise number and the information element
number, separated by a slash (/). (Requires libfixbuf 1.4.0 or later.) Since SiLK 3.8.2.
SILK LIBFIXBUF SUPPRESS WARNINGS
When set to 1, rwipfix2silk disables all warning messages generated by libfixbuf. These warning
messages include out-of-sequence packets, data records not having a corresponding template, record
count discrepancies, and issues decoding list elements. Since SiLK 3.10.0.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SEE ALSO
rwsilk2ipfix(1), rwfileinfo(1), rwp2yaf2silk(1), rwptoflow(1), silk(7), yaf(1), tcpdump(1),
pcap(3), zlib(3)
December 18, 2014
161
rwmatch(1)
The SiLK Reference Guide
rwmatch
Match SiLK records from two streams into a common stream
SYNOPSIS
rwmatch --relate=FIELD_PAIR [--relate=FIELD_PAIR ...]
[--time-delta=DELTA] [--symmetric-delta]
[{ --absolute-delta | --relative-delta | --infinite-delta }]
[--unmatched={q|r|b}]
[--note-add=TEXT] [--note-file-add=FILE]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--compression-method=COMP_METHOD]
[--site-config-file=FILENAME]
QUERY_FILE RESPONSE_FILE OUTPUT_FILE
rwmatch --help
rwmatch --version
DESCRIPTION
rwmatch provides a facility for relating (or matching) SiLK Flow records contained in two sorted input
files, labeling those flow records, and writing the records to an output file.
The two input files are called QUERY FILE and RESPONSE FILE, respectively. The purpose of rwmatch
is to find a record in QUERY FILE that represents some network stimulus that caused a reply which
is represented by a record in RESPONSE FILE. When rwmatch discovers this relationship, it assigns a
numeric ID to the match, searches both input files for additional records that are part of the same event,
stores the numeric ID in each matching record’s next hop IP field, and writes all records that are part of
that event to OUTPUT FILE.
When the --symmetric-delta switch is specified, rwmatch also checks for a stimulus in RESPONSE FILE
that triggered a reply in QUERY FILE. This is useful when matching flows where either side may have
initiated the conversation.
The input files must be sorted as described in Sorting the input below. To use the standard input in place
of one of the input streams, specify stdin or - in its place.
The criteria for defining a match are given by one of more uses of the --relate switch and by the timestamps
on the flow records:
• Each use of --relate on the command line takes two comma-separated SiLK Flow record fields as its
argument. These two fields form a FIELD PAIR in the form QUERY FIELD,RESPONSE FIELD. For
a match to exist, the value of QUERY FIELD on a record read from QUERY FILE must be identical
to the value of RESPONSE FIELD on a record read from RESPONSE FILE, and that must be true
for all FIELD PAIRs.
• By default, the start-time of the record from the RESPONSE FILE must begin within a time window
determined by the start- and end-times of the record read from the QUERY FILE. The end-time is
extended by specifying the DELTA number of seconds as the argument to the --time-delta switch.
Thus
162
December 18, 2014
The SiLK Reference Guide
rwmatch(1)
query_rec.sTime <= response_rec.sTime <= query_rec.eTime + DELTA
When the --symmetric-delta switch is provided, records also match if the start-time of the query
record begins within the time window determined by the start- and end-times of the response record,
plus any value specified by --time-delta. That is:
response_rec.sTime <= query_rec.sTime <= response_rec.eTime + DELTA
The --time-delta switch allows for a delay in the response. Although responses usually occur within
a second of the query, delays of several seconds are not uncommon due to combinations of host and
network processing delays. The DELTA value can also compensate for timing errors between multiple
sensors.
Once rwmatch establishes a match between records in the two input files, it searches for additional records
from both input files to add to the match.
To do this, rwmatch denotes one of the records that comprise the initial match pair as a base record. When
possible, the base record is the record with the earlier start time. In the case of a tie, the base is determined
by ports for TCP and UDP with the base being that with the lower port if one is above 1024 and the other
below 1024. If that also fails, the base record is the record read from QUERY FILE. With millisecond time
resolution, ties should be rare.
To determine whether a match exists between the base record and a candidate record, rwmatch uses the
FIELD PAIRs specified by --relate. When the base record and the candidate record were read from the
same file, only one side of each FIELD PAIR is used.
In addition to the records having identical values for each field in FIELD PAIRs, the candidate record must
be within a time window determined by the --time-delta switch and the --absolute-delta, --relativedelta, and --infinite-delta switches.
• When --infinite-delta is specified, there is no time window and only the values specified by the
FIELD PAIRs are checked.
• Specifying --absolute-delta requires each candidate record to start within the time window set by
the start- and end-times of the base record (plus any DELTA), similar to the rule used to establish the
match.
• If --relative-delta is specified, the end of the time window is initially set to DELTA seconds after
the end-time of the base record. As records from either input file are added to the match, the end of
the time window is set to DELTA seconds beyond the maximum end-time seen on any record in the
match.
• When none of the above are explicitly specified, rwmatch uses the rules of --absolute-delta.
Because long-lived sessions are often broken into multiple flows, rwmatch may discard records that are part
of a long-lived session. The --relative-delta switch may compensate for this if the gap between flows is
less that the time specified in the --time-delta switch. The --infinite-delta will compensate for arbitrarily
long gaps, but it may add records to a match that are not part of a true session. DNS flows that use port
53/udp as both a service and reply port are an example.
When rwmatch establishes a match, it increments the match ID, with the first match having a match ID
of 1. To label the records that comprise the match, rwmatch uses a 32-bit number where the lower 24-bits
hold the match ID and the upper 8-bits is set to 0 or 255 to indicate whether the record was read from
QUERY FILE or RESPONSE FILE, respectively. rwmatch stores this 32-bit number in the next hop IP
December 18, 2014
163
rwmatch(1)
The SiLK Reference Guide
field of the records. If the record is IPv6, rwmatch maps the number into the ::ffff:0:0/96 netblock before
modifying setting the next hop IP. Apart from the change to the next hop IP field, the query and response
records are not modified.
By default, only matched records are written to the OUTPUT FILE and any record that could not be
determined to be part of a match is discarded.
Specifying the --unmatched switch tells rwmatch to write unmatched query and/or response records to
OUTPUT FILE. The required parameter is one of q, r, or b to write the query records, the response records,
or both to OUTPUT FILE. Unmatched query records have their next hop IP set to 0.0.0.0, and unmatched
response records have their next hop IP set to 255.0.0.0.
Sorting the input
As rwmatch reads QUERY FILE and RESPONSE FILE, it expects the SiLK Flow records to appear in a
particular order that is best achieved by using rwsort(1). In particular:
• The records in QUERY FILE must appear in ascending order where the key is the first value in each
of the --relate FIELD PAIRs in the order in which the --relate switches appear and by the start time
of the flow.
• Likewise for the records in RESPONSE FILE, except the second value in each FIELD PAIRs is used.
When rwmatch processes the following command
$ rwmatch --relate=1,2 --relate=2,1 --relate=5,5 Q.rw R.rw out.rw
it assumes the file1.rw and file2.rw were created by
$ rwsort --fields=1,2,5,stime --output=Q.rw input1.rw ....
$ rwsort --fields=2,1,5,stime --output=R.rw input2.rw ....
If the files source ips.s.rw and dest ips.s.rw are created by the following commands:
$ rwsort --field=1,9 source_ips.rw > source_ips.s.rw
$ rwsort --field=2,9 dest_ips.rw > dest_ips.s.rw
The following call to rwmatch works correctly:
$ rwmatch --relate=1,2 source_ips.s.rw dest_ips.s.rw matched.rw
Note that the following command produces very few matches since source ips.s.rw was sorted on field 1 and
dest ips.s.rw was sorted on field 2.
$ rwmatch --relate=2,1 source_ips.s.rw dest_ips.s.rw stdout
The recommended sort ordering for TCP and UDP is shown below. This correctly handles multiple flows
occurring during the same time interval which involve multiple ports:
164
December 18, 2014
The SiLK Reference Guide
rwmatch(1)
$ rwsort --fields=1,4,2,3,5,stime incoming.rw > incoming-query.rw
$ rwsort --fields=2,3,1,4,5,stime outgoing.rw > outgoing-response.rw
The corresponding rwmatch command is:
$ rwmatch --relate=1,2 --relate=4,3 --relate=2,1 --relate=3,4 \
--relate=5,5 incoming-query.rw outgoing-response.rw matched.rw
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--relate=FIELD PAIR
Specify a pair of fields where the value of these fields in two records must be identical for the records
to be considered part of a match. The first field is for records from QUERY FILE and the second for
records from RESPONSE FILE. At least one FIELD PAIR must be provided; up to 128 FIELD PAIRs
may be provided. The FIELD PAIR must contain two field names or field IDs separated by a comma,
such as --relate=dip,sip or --relate=proto,proto.
Each FIELD PAIR is unidirectional; specifying --relate=sip,dip matches records where the query
record’s source IP matches the response record’s destination IP, but does not imply any relationship
between the response’s source IP and query’s destination IP. To match symmetric flow records between
hosts, specify:
--relate=sip,dip --relate=dip,sip
When using a port-based protocol (e.g., TCP or UDP), refine the match further by specifying the
ports:
--relate=2,1 --relate=1,2 --relate=3,4 --relate=4,3
Matching becomes more specific as more fields are added. Since rwmatch discards unmatched records,
a highly specific match (such as the last one specified above) generates more matches (resulting in higher
match IDs), but may result in fewer total flows due to certain records being unmatched.
The available fields are listed here. For a better description of some of these fields, see the rwcut(1)
manual page.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent
December 18, 2014
165
rwmatch(1)
The SiLK Reference Guide
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sensor,12
name or ID of sensor at the collection point
class,20
class of sensor at the collection point
type,21
type of sensor at the collection point
iType
the ICMP type value for ICMP or ICMPv6 flows and empty for non-ICMP flows. This field was
introduced in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at
iType.
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by the flow generator
application,29
guess as to the content of the flow
--time-delta=DELTA
Specify the number of seconds by which a response record may start after a query record has ended.
DELTA may contain fractional seconds to millisecond precision; for example, 0.500 represents a 500
millisecond delay. Responses match queries if
query.sTime <= response.sTime <= query.eTime + DELTA
When --time-delta is not specified, DELTA defaults to 0 and the response must begin before the
query ends.
166
December 18, 2014
The SiLK Reference Guide
rwmatch(1)
--symmetric-delta
Allow matching of flows where the RESPONSE FILE contains the initial flow. In this case, a query
record matches a response record when
response.sTime <= query.sTime <= response.eTime + DELTA
--absolute-delta
When adding additional records to an established match, only include candidate flows that start less
than DELTA seconds after the end of the initial flow. This is the default behavior. This switch is
incompatible with --relative-delta and --infinite-delta.
--relative-delta
When adding additional records to an established match, include candidate flows that start within
DELTA seconds of the greatest end time for all records in the current match. This switch is incompatible
with --absolute-delta and --infinite-delta.
--infinite-delta
When adding additional records to an established match, include candidate records based on the
FIELD PAIRS alone, ignoring time. This switch is incompatible with --absolute-delta and -relative-delta.
--unmatched=q|r|b
Write unmatched query and/or response records to OUTPUT FILE. The parameter determines
whether the query records, the response records, or both are written to OUTPUT FILE. Unmatched
query records have their next hop IPv4 address set to 0.0.0.0, and unmatched response records have
their next hop IPv4 address set to 255.0.0.0. When the b value is used, OUTPUT FILE contains a
complete merge of QUERY FILE and RESPONSE FILE.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains.
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
December 18, 2014
167
rwmatch(1)
The SiLK Reference Guide
mix
Process the input as a mixture of IPv4 and IPv6 flow records. Should rwmatch need to compare
an IPv4 and IPv6 address, it maps the IPv4 address into the ::ffff:0:0/96 prefix.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwmatch searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
168
December 18, 2014
The SiLK Reference Guide
rwmatch(1)
Matching TCP Flows
rwmatch is a generalized matching tool; the most basic function provided by rwmatch is the ability to
match both sides of a TCP connection. Given incoming and outgoing web traffic in two files web in.rw and
web out.rw, the following sequence of commands will generate a file, web-sessions.rw consisting of matched
sessions for every complete web session in web in.rw and web out.rw :
$ rwsort --field=1,2,3,4,stime web_in.rw > web_in-s.rw
$ rwsort --field=2,1,4,3,stime web_out.rw > web_out-s.rw
$ rwmatch --relate=1,2 --relate=2,1 --relate=3,4 --relate=4,3
web_in-s.rw web_out-s.rw web-sessions.rw
\
Finding Responses to a Scan
Because rwmatch can match fields arbitrarily, you can also match records across different protocols. Suppose
there are two SiLK Flow files, indata.rw and outdata.rw, that contain the incoming and outgoing data,
respectively, for a particular time period.
To trace responses to a scan attempt, we start by identifying a specific horizontal scan. In this example, we
use an SMTP scan on TCP port 25. Assume that we have an IPset file, smtp-scanners.set, that contains
the external IP addresses that scanned us port port 25. (Perhaps this file was obtained by using rwscan(1)
and rwscanquery(1).)
First, use rwfilter(1) to find the flow records matching these scan attempts in the incoming data file. Sort
the output of rwfilter by source IP, source port, destination IP, destination port, and time, and store the
results in smtp-scans.rw :
$ rwfilter --proto=6 --sip-set=smtp-scanners.set --dport=25
--pass=- indata.rw
| rwsort --field=sip,sport,dip,dport,stime > smtp-scans.rw
\
\
We can identify hosts that responded to the scan (we consider a accepting the TCP connection as a response)
by finding potential replies in the outgoing data file, sorting them, and storing the results in scan-response.rw.
For this command on the outgoing data, note that we must swap source and destination from the values
used for the incoming data:
$ rwfilter --proto=6 --dip-set=smtp-scanners.set --sport=25
--pass=- outdata.rw
| rwsort --field=dip,dport,sip,sport,stime > scan-response.rw
\
\
We can now match the flow records to produce the file matched-scans.rw :
$ rwmatch --relate=1,2 --relate-3,4 --relate=2,1 --relate=4,3
smtp-scans.rw scan-response.rw matched-scans.rw
\
The results file, matched-scans.rw, will contain all the exchanges between the scanning hosts and the responders on port 25. Examination of these flows may show evidence of buffer overflows, data exfiltration, or
similar attacks.
December 18, 2014
169
rwmatch(1)
The SiLK Reference Guide
Next, we want to identify responses to the scan that were produced by our routers, such as ICMP destination
unreachable messages.
Use rwfilter to find the ICMP messages going to the scanning hosts, sort the flow records, and store the
results in icmp.rw :
$ rwfilter --proto=1 --icmp-type=3 --pass=stdout
| rwsort --field=dip,stime > icmp.rw
outdata.rw
\
Run rwmatch and match exclusively on the IP address.
$ rwmatch --relate=2,1
icmp.rw
smtp-scans.rw
result.rw
The resulting file, result.rw will consist of single packet flows (from smtp-scans.rw ) with an ICMP response
(from icmp.rw ).
Similar queries can be used to identify other multiple-protocol phenomena, such as the results of a traceroute.
Displaying the Results
These examples assume matched.rw is an output file produced by rwmatch.
When using rwcut(1) to display the records in matched.rw, you may specify the next hop IP field (nhIP)
to see the match identifier:
$ rwcut --num-rec=8 --fields=sip,sport,dip,dport,type,nhip matched.rw
sIP|sPort|
dIP|dPort|
type|
nhIP|
10.4.52.235|29631|192.168.233.171|
80| inweb|
0.0.0.1|
192.168.233.171|
80|
10.4.52.235|29631| outweb|
255.0.0.1|
10.9.77.117|29906| 192.168.184.65|
80| inweb|
0.0.0.2|
192.168.184.65|
80|
10.9.77.117|29906| outweb|
255.0.0.2|
10.14.110.214|29989| 192.168.249.96|
80| inweb|
0.0.0.3|
192.168.249.96|
80| 10.14.110.214|29989| outweb|
255.0.0.3|
10.18.66.79|29660| 192.168.254.69|
80| inweb|
0.0.0.4|
192.168.254.69|
80|
10.18.66.79|29660| outweb|
255.0.0.4|
The first record is a query from the external host 10.4.52.235 to the web server on the internal host
192.168.233.171, and the second record is the web server’s response. The third and fourth records represent another query/response pair.
The cutmatch.so plug-in is an alternate way to display the match parameter that rwmatch writes into the
next hop IP field. The cutmatch.so plug-in defines a match field that displays the direction of the flow (->
represents a query and <- a response) and the match ID. To use the plug-in, you must explicit load it into
rwcut by specifying the --plugin switch. You can then add match to the list of --fields to print:
$ rwcut --plugin=cutmatch.so --num-rec=8 \
--fields=sip,sport,match,dip,dport,type matched.rw
sIP|sPort| <->Match#|
dIP|dPort|
type|
10.4.52.235|29631|->
1|192.168.233.171|
80| inweb|
192.168.233.171|
80|<1|
10.4.52.235|29631| outweb|
170
December 18, 2014
The SiLK Reference Guide
10.9.77.117|29906|->
192.168.184.65|
80|<10.14.110.214|29989|->
192.168.249.96|
80|<10.18.66.79|29660|->
192.168.254.69|
80|<-
rwmatch(1)
2| 192.168.184.65|
80| inweb|
2|
10.9.77.117|29906| outweb|
3| 192.168.249.96|
80| inweb|
3| 10.14.110.214|29989| outweb|
4| 192.168.254.69|
80| inweb|
4|
10.18.66.79|29660| outweb|
Using the sIP and dIP fields is confusing when the file you are examining contains both incoming and
outgoing flow records. To make the output from rwmatch more clear, use the int-ext-fields(3) plug-in as
well. That plug-in allows you to display the external IPs in one column and the internal IPs in a another
column. See its manual page for additional information.
$ export INCOMING_FLOWTYPES=all/in,all/inweb
$ export OUTGOING_FLOWTYPES=all/out,all/outweb
$ rwcut --plugin=cutmatch.so --plugin=int-ext-fields.so --num-rec=8 \
--fields=ext-ip,ext-port,match,int-ip,int-port,proto matched.rw
ext-ip|ext-p| <->Match#|
int-ip|int-p|
type|
10.4.52.235|29631|->
1|192.168.233.171|
80| inweb|
10.4.52.235|29631|<1|192.168.233.171|
80| outweb|
10.9.77.117|29906|->
2| 192.168.184.65|
80| inweb|
10.9.77.117|29906|<2| 192.168.184.65|
80| outweb|
10.14.110.214|29989|->
3| 192.168.249.96|
80| inweb|
10.14.110.214|29989|<3| 192.168.249.96|
80| outweb|
10.18.66.79|29660|->
4| 192.168.254.69|
80| inweb|
10.18.66.79|29660|<4| 192.168.254.69|
80| outweb|
ENVIRONMENT
SILK IPV6 POLICY
This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwmatch may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwmatch may use this environment variable. See the FILES section for details.
December 18, 2014
171
rwmatch(1)
The SiLK Reference Guide
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwfilter(1), rwsort(1), rwcut(1), rwfileinfo(1), rwscan(1), rwscanquery(1), sensor.conf(5),
silk(7), zlib(3)
NOTES
SiLK 3.9.0 expanded the set of fields accepted by the --relate switch and added support for IPv6 flow
records.
172
December 18, 2014
The SiLK Reference Guide
rwnetmask(1)
rwnetmask
Zero out lower bits of IP addresses in SiLK Flow records
SYNOPSIS
rwnetmask [--4sip-prefix-length=N] [--6sip-prefix-length=N]
[--4dip-prefix-length=N] [--6dip-prefix-length=N]
[--4nhip-prefix-length=N] [--6nhip-prefix-length=N]
[--sip-prefix-length=N] [--dip-prefix-length=N]
[--nhip-prefix-length=N] [--output-path=PATH]
[--print-filenames] [--ipv6-policy=POLICY]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwnetmask --help
rwnetmask --version
DESCRIPTION
rwnetmask reads SiLK Flow records, sets the prefix of the source IP, destination IP, and/or next hop IP to
the specified value(s) by masking the least significant bits of the address(es), and writes the modified SiLK
Flow records to the specified output path. Modifying the IP addresses allows one to group IPs into arbitrary
CIDR blocks. Multiple prefix-lengths may be specified; at least one must be specified.
When SiLK is compiled with IPv6 support, a separate mask can be specified for IPv4 and IPv6 addresses.
Records are processed using the IP-version in which they are read. The --ipv6-policy switch can be used
to force the records into a particular IP-version or to ignore records of a particular IP-version.
rwnetmask reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed
as it is read. When the --xargs switch is provided, rwnetmask will read the names of the files to process
from the named text file, or from the standard input if no file name argument is provided to the switch. The
input to --xargs must contain one file name per line.
When no output path is specified and the standard output is not connected to a terminal, rwnetmask
writes the records to the standard output.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
One of these switches must be provided:
--4sip-prefix-length=N
December 18, 2014
173
rwnetmask(1)
The SiLK Reference Guide
--sip-prefix-length=N
For IPv4 addresses, specify the number of most significant bits of the source address to keep. The
default is to not mask off any bits (i.e., N =32).
--4dip-prefix-length=N
--dip-prefix-length=N
For IPv4 addresses, specify the number of most significant bits of the destination address to keep. The
default is to not mask off any bits (i.e., N =32).
--4nhip-prefix-length=N
--nhip-prefix-length=N
For IPv4 addresses, specify the number of most significant bits of the next-hop address to keep. The
default is to not mask off any bits (i.e., N =32).
--6sip-prefix-length=N
For IPv6 addresses, specify the number of most significant bits of the source address to keep. The
default is to not mask off any bits (i.e., N =128).
--6dip-prefix-length=N
For IPv6 addresses, specify the number of most significant bits of the destination address to keep. The
default is to not mask off any bits (i.e., N =128).
--6nhip-prefix-length=N
For IPv6 addresses, specify the number of most significant bits of the next-hop address to keep. The
default is to not mask off any bits (i.e., N =128).
These switches are optional:
--output-path=PATH
Write the output to the named PATH. PATH may be a file, named pipe, or the symbols stdout or to write to the standard output. When not specified, output will be written to the standard output.
rwnetmask will exit with an error if the output path is the standard output and the standard output
is connected to a terminal.
--print-filenames
Print to the standard error the names of the input files as the files are opened.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only records
marked as IPv4 will be processed.
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
174
December 18, 2014
The SiLK Reference Guide
rwnetmask(1)
mix
Process the input as a mixture of IPv4 and IPv6 flows.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwnetmask searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwnetmask to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwnetmask will open each file in turn
and read records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
December 18, 2014
175
rwnetmask(1)
The SiLK Reference Guide
EXAMPLES
In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To summarize the TCP traffic from your network to each /24 on the Internet, use:
$ rwfilter --type=out,outweb --proto=6 --pass=stdout
| rwnetmask --dip-prefix-length 24
| rwaddrcount --use-dest --sort --print-rec
IP Address| Bytes|Packets|Records|
Start Time|...
10.10.35.0| 2345|
52|
6|01/15/2003 19:30:31|
10.23.3.0|
118|
2|
1|01/16/2003 19:38:40|
10.23.4.0| 20858|
263|
16|01/16/2003 16:54:25|
10.31.49.0|266920|
3885|
1092|01/11/2003 02:04:11|
10.126.7.0| 36912|
260|
9|01/16/2003 17:03:28|
....
\
\
ENVIRONMENT
SILK IPV6 POLICY
This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwnetmask may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwnetmask may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
176
December 18, 2014
The SiLK Reference Guide
rwnetmask(1)
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwfileinfo(1), silk(7), zlib(3)
December 18, 2014
177
rwp2yaf2silk(1)
The SiLK Reference Guide
rwp2yaf2silk
Convert PCAP data to SiLK Flow Records with YAF
SYNOPSIS
rwp2yaf2silk --in=INPUT_SPEC --out=FILE [--dry-run]
[--yaf-program=YAF] [--yaf-args=’ARG1 ARG2’]
[--rwipfix2silk-program=RWIPFIX2SILK] [--rwipfix2silk-args=’ARG1 ARG2’]
rwp2yaf2silk --help
rwp2yaf2silk --man
rwp2yaf2silk --version
DESCRIPTION
rwp2yaf2silk is a script to convert a pcap(3) file, such as that produced by tcpdump(1), to a single file
of SiLK Flow records. The script assumes that the yaf(1) and rwipfix2silk(1) commands are available on
your system.
The --in and --out switches are required. Note that the --in switch is processed by yaf, and the --out
switch is processed by rwipfix2silk.
For information on reading live pcap data and using rwflowpack(8) to store that data in hourly files, see
the SiLK Installation Handbook.
Normally yaf groups multiple packets into flow records. You can almost force yaf to create a flow record
for every packet so that its output is similar to that of rwptoflow(1): When you give yaf the --idletimeout=0 switch, yaf creates a flow record for every complete packet and for each packet that it is able
to completely reassemble from packet fragments. Any fragmented packets that yaf cannot reassemble are
dropped.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--in=INPUT SPEC
Read the pcap records from INPUT SPEC. Often INPUT SPEC is the name of the pcap file to read
or the string string - or stdin to read from standard input. To process multiple pcap files, create
a text file that lists the names of the pcap files. Specify the text file as INPUT SPEC and use
--yaf-args=caplist to tell yaf the INPUT SPEC contains the names of pcap files.
--out=FILE
Write the SiLK Flow records to FILE. The string stdout or - may be used for the standard output,
as long as it is not connected to a terminal.
178
December 18, 2014
The SiLK Reference Guide
rwp2yaf2silk(1)
--dry-run
Do not invoke any commands, just print the commands that would be invoked.
--yaf-program=YAF
Use YAF as the location of the yaf program. When not specified, rwp2yaf2silk assumes there is a
program yaf on your $PATH.
--yaf-args=ARGS
Pass the additional ARGS to the yaf program.
--rwipfix2silk-program=RWIPFIX2SILK
Use RWIPFIX2SILK as the location of the rwipfix2silk program. When not specified, rwp2yaf2silk
assumes there is a program rwipfix2silk on your $PATH.
--rwipfix2silk-args=ARGS
Pass the additional ARGS to the rwipfix2silk program.
--help
Display a brief usage message and exit.
--man
Display full documentation for rwp2yaf2silk and exit.
--version
Print the version number and exit the application.
SEE ALSO
yaf(1), rwipfix2silk(1), rwflowpack(8), rwptoflow(1), silk(7), tcpdump(1), pcap(3), SiLK Installation Handbook
December 18, 2014
179
rwpcut(1)
The SiLK Reference Guide
rwpcut
Outputs a tcpdump dump file as ASCII
SYNOPSIS
rwpcut [--columnar]
[--delimiter=DELIMITER]
[--epoch-time]
[--fields=PRINT_FIELDS]
[--integer-ips]
[--zero-pad-ips]
FILE...
DESCRIPTION
rwpcut outputs tcpdump files in an easy to parse way. It supports a user-defined list of fields to output
and a user-defined delimiter between columns.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option.
OUTPUT SWITCHES
--columnar
Pad each field with whitespace so that it always takes up the same number of columns. The two
payload printing fields, payhex and payascii, never pad with whitespace.
--delimiter=DELIMITER
DELIMITER is used as the delimiter between columns instead of the default ’|’.
--epoch-time
Display the timestamp as epoch time seconds instead of a formatted timestamp.
--fields=PRINT FIELDS
PRINT FIELDS is a comma-separated list of fields to include in the output. The available fields are:
timestamp - packet timestamp sip - source IP address. dip - destination IP address sport - source
port dport - destination port proto - IP protocol payhex - Payload printed as a hex stream payascii Payload printed as an ascii stream. Non-printing characters are represented with periods.
--integer-ips
Display IP addresses as integers instead of in dotted quad notation.
--zero-pad-ips
Pad dotted quad notation IP addresses so that each quad occupies three columns.
180
December 18, 2014
The SiLK Reference Guide
rwpcut(1)
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
$ rwpcut --fields=sip,dip,sport,dport,proto --columnar data.dmp
sip|
220.245.221.126|
220.245.221.126|
dip|sport|dport|proto|
192.168.1.100|21776| 6882|
6|
192.168.1.100|21776| 6882|
6|
$ rwpcut --fields=timestamp,payhex data.dmp
(Carriage returns mid-payload added for legibility)
timestamp|payhex|
2005-04-20 04:28:59.091470|4500003cd85840003206f3e2dcf5dd7
ec0a8016455101ae2811b6bce00000000a002ffff59990000020405ac0
10303000101080a524dc5cc00000000|
2005-04-20 04:29:02.057390|4500003cd88c40003206f3aedcf5dd7
ec0a8016455101ae2811b6bce00000000a002ffff59930000020405ac0
10303000101080a524dc5d200000000|
SEE ALSO
rwptoflow(1), silk(7)
BUGS
Note that payhex and payascii do not whitespace pad themselves if --columnar is used.
The payascii field does not escape the delimiter character in any way, so care should be taken when parsing
it.
December 18, 2014
181
rwpdedupe(1)
The SiLK Reference Guide
rwpdedupe
Eliminate duplicate packets collected by several sensors
SYNOPSIS
rwpdedupe { --first-duplicate | --random-duplicate[=SCALAR] }
[--threshold=MILLISECONDS] FILE... > OUTPUT-FILE
rwpdedupe --help
rwpdedupe --version
DESCRIPTION
Detects and eliminates duplicate records from tcpdump(1) capture files. Duplicate records are defined as
having timestamps within a user-configurable time of each other. In addition, their Ethernet (OSI layer 3)
headers must match. If they are not IP packets, then their entire Ethernet payload must match. If they are
IP packets, then their source and destination addresses, protocol, and IP payload must match.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--threshold=MILLISECONDS
Set the maximum number of milliseconds which may elapse between two packets and still have those
packets be detected as duplicates. Default 0 (exact timestamp match). Must be a value between 0 and
1,000,000 milliseconds.
One and only one of the following switches is required:
--first-duplicate
When selecting between multiple duplicate packets, always choose the packet with the earliest timestamp. Not compatible with --random-duplicate.
--random-duplicate
--random-duplicate=SCALAR
Select a random packet from the list of duplicate packets. SCALAR is a random number seed, so that
multiple runs can produce identical results.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
182
December 18, 2014
The SiLK Reference Guide
rwpdedupe(1)
EXAMPLES
In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Given tcpdump files data1.tcp and data2.tcp, detect and eliminate duplicate packets which occur within
one second of each other (when choosing which timestamp to output, pick one randomly.) Store the result
file in out.tcp.
$ rwpdedupe --threshold=1000 --random-duplicate \
data1.tcp data2.tcp > out.tcp
SEE ALSO
mergecap(1), tcpdump(1), pcap(3)
NOTES
mergecap(1) can be used to merge two tcpdump capture files without eliminating duplicate packets.
December 18, 2014
183
rwpdu2silk(1)
The SiLK Reference Guide
rwpdu2silk
Convert NetFlow v5 records to SiLK Flow records
SYNOPSIS
rwpdu2silk [--silk-output=FILE] [--print-statistics]
[--log-destination={stdout | stderr | none | PATH}]
[--log-flags={none | { {all | bad | default | missing
| record-timestamps} ...} } ]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
{--xargs | --xargs=FILENAME | PDUFILE [PDUFILE...]}
rwpdu2silk --help
rwpdu2silk --version
DESCRIPTION
rwpdu2silk reads NetFlow v5 PDU (Protocol Data Units) records from one or more files, converts the
records to the SiLK Flow format, and writes the SiLK records to the path specified by --silk-output or
to the standard output when --silk-output is not provided. Note that rwpdu2silk cannot read from the
standard input.
rwpdu2silk expects its input files to be a the format created by Cisco’s NetFlow Collector: The file’s size
must be an integer multiple of 1464, where each 1464 octet chunk contains a 24 octet NetFlow v5 header
and space for thirty 48 octet NetFlow v5 records. The number of valid records per chunk is specified in the
header.
rwpdu2silk reads NetFlow v5 records from the files named on the command line when --xargs is not
present. If an input file name ends in .gz, the file will be uncompressed as it is read. When the --xargs
switch is provided, rwpdu2silk will read the names of the files to process from the named text file, or from
the standard input if no file name argument is provided to the switch. The input to --xargs must contain
one file name per line.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--silk-output=FILE
Write the SiLK records to FILE, which must not exist. If the switch is not provided or if FILE has
the value stdout or -, the SiLK flows are written to the standard output.
--print-statistics
Print, to the standard error, the number of records that were written to the SiLK output file. See also
--log-destination.
184
December 18, 2014
The SiLK Reference Guide
rwpdu2silk(1)
--log-destination={none | stdout | stderr | PATH }
Write more detailed information to the specified destination. The default destination is none which
suppresses messages. Use stdout or stderr to send messages to the standard output or standard
error, respectively. Any other value is treated as a file name in which to write the messages. When
an existing file is specified, rwpdu2silk appends any messages to the file. Information that is written
includes the following:
• For each input stream, the number of PDU records read, number of SiLK records generated,
number of missing records (based on the NetFlow v5 sequence number), and number of invalid
records.
• Messages about each NetFlow v5 packet that was rejected due a bad version number or having a
record count of 0 or more than 30.
• Additional messages enabled by the --log-flags switch.
--log-flags=FLAGS
Write additional messages regarding the NetFlow v5 data to the --log-destination, where FLAGS is a
comma-separated list of names specifying the type messages to write. When this switch is not specified,
the default value for FLAGS is none. This switch takes the same values as the log-flags setting in the
sensor.conf(5) file. This manual page documents the values that are relevant for NetFlow v5 data.
Since SiLK 3.10.0.
all
Log everything.
bad
Write messages about an individual NetFlow v5 record where the packet or octet count is zero,
the packet count is larger than the octet count, or the duration of the flow is larger than 45 days.
default
Enable the default set of log-flags used be sensor.conf : bad, missing. Despite the name, this is
not the default setting for this switch; none is.
missing
Examine the sequence numbers of NetFlow v5 packets and write messages about missing and outof-sequence packets. (Currently it is not possible to suppress messages regarding out-of-sequence
NetFlow v9 or IPFIX packets.)
none
Log nothing. It is an error to combine this log-flag name with any other. This is the default
setting for --log-flags.
record-timestamps
Log the timestamps that appear on each record. This produces a lot of output, and it is primarily
used for debugging.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
December 18, 2014
185
rwpdu2silk(1)
The SiLK Reference Guide
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--xargs
--xargs=FILENAME
Causes rwpdu2silk to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwpdu2silk will open each file in turn
and read records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SEE ALSO
rwfileinfo(1), rwflowpack(8), silk(7), zlib(3)
BUGS
rwpdu2silk cannot read from the standard input.
186
December 18, 2014
The SiLK Reference Guide
rwpmapbuild(1)
rwpmapbuild
Create a binary prefix map from a text file
SYNOPSIS
rwpmapbuild [--input-file=FILENAME] [--output-file=FILENAME]
[--mode={ipv4|ipv6|proto-port}] [--dry-run] [--ignore-errors]
[--note-add=TEXT] [--note-file-add=FILENAME]
rwpmapbuild --help
rwpmapbuild --version
DESCRIPTION
Prefix maps provide a way to map field values (specifically either IP addresses or protocol-port pairs) to
string labels based on a user-defined map file. rwpmapbuild reads textual input to create a binary prefix
map file. The syntax of this input is described in the INPUT FILE FORMAT section below.
As described in pmapfilter(3), you can partition, count, sort and display SiLK flow records based on the
string labels defined in the prefix map. To view the contents of a prefix map file, use rwpmapcat(1). To
query the contents of a prefix map, use rwpmaplookup(1).
The textual input is read from the specified input file, or from the standard input when the --input-file
switch is not provided. The binary output is written to the named output file, or to the standard output
when the --output-file switch is not provided and the standard output is not connected to a terminal.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--input-file=FILENAME
Read the textual input from FILENAME. You may use stdin or - to represent the standard input.
When this switch is not provided, the input is read from the standard input. The input file format is
described below.
--output-file=FILENAME
Write the binary prefix map to FILENAME. You may use stdout or - to represent the standard
output. When this switch is not provided, the prefix map is written to the standard output unless the
standard output is connected to a terminal.
--mode={ipv4|ipv6|proto-port}
Specify the type of the input, as if a mode statement appeared in the input stream. The value specified
by this switch must not conflict with an explicit mode statement appearing in the input.
--dry-run
Do not write the output file. Simply check the syntax of the input file.
December 18, 2014
187
rwpmapbuild(1)
The SiLK Reference Guide
--ignore-errors
Write the output file regardless of any errors encountered while parsing the input file.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
INPUT FILE FORMAT
The input file format consists of any number of input lines of the forms described below. Note that there is
not a form that accepts a single IP address and a label; therefore, to provide a label for a single IP address
you must append /32 to a single IPv4 address (or /128 to a single IPv6 address).
Blank lines in the input file are ignored, as are comments. Comments begin with the first # character on a
line and extend to the end of the line.
rwpmapbuild maps ranges to string labels. These string labels may be created either explicitly via the
label statement or implicitly by specifying text after a range, but a single input file must use only one
method to create labels. When the label statement is used, all labels must be pre-declared in the label
statement prior to their use in the default statement or an range statements.
In the following, the label-value represents either a numerical label identifier that was created with the label
statement or label-text.
NOTE: Unlike many SiLK input files, there is no explicit delimiter between the range and the string label.
The range and string label are separated by whitespace. The first non-whitespace character after the range
begins the label.
label-text is a textual string that begins at the first non-whitespace character and extends to the final nonwhitespace character on that line that does not appear in a comment. The label-text may include embedded
whitespace and non-alphanumeric characters. While a comma (,) is legal in the label-text, using a comma
prevents the label from being used by the --pmap-src and --pmap-dest switches in rwfilter(1).
The following statements are supported:
map-name simple-string
Creates a name for the data in this prefix map file. The simple-string cannot contain whitespace, a
comma, or a colon. When the prefix map file is used by rwfilter(1), the simple-string is used to
generate the filtering switch names. When the prefix map file is used by rwcut(1), rwgroup(1),
rwsort(1), rwstats(1), or rwuniq(1), the simple-string is used to generate the field names. See
pmapfilter(3) for details.
188
December 18, 2014
The SiLK Reference Guide
rwpmapbuild(1)
label num label-text
Associate the numeric identifier num with the given label text label-text. It is an error if num or
label-text appear in any other label statement. The maximum allowed value for num is 2147483647,
but note that rwpmapbuild creates an empty label for all the unassigned numeric identifiers that
are less than the maximum identifier used in the input file. The label statement must appear before
the default statement and before range definitions. When a label statement appears in the input,
rwpmapbuild will complain if you attempt to use a label-value that was not previously defined in a
label statement.
default label-value
Make the given label identifier or label text the default value for any ranges not explicitly mentioned
in this input file. The default statement must appear before any ranges are specified. If the default
statement does not appear in the input, the label UNKNOWN is automatically defined and used as the
default.
mode { ipv4 | ipv6 | proto-port | ip }
Specify how to process the file. The mode statement must appear before any ranges are specified.
The mode can also be set using the --mode command line switch. When both the mode statement
and the --mode switch are given, their values must match. When neither the mode statement nor the
--mode switch is provided, rwpmapbuild processes the input in IPv4 address mode. The ip mode
is deprecated; it is an alias for ipv4.
Address Mode
When rwpmapbuild is in IPv4 address mode, any IPv6 address in the input file will raise an error.
cidr-block label-value
Associate the given label identifier or label text with this CIDR block. The CIDR block is composed
of an IP address in canonical notation (e.g, dotted-decimal for IPv4), a slash /, and the number of
significant bits.
low-ip high-ip label-value
Associate the given label identifier or label text with this IP range, where low-ip and high-ip are in
canonical notation.
low-int high-int label-value
Treat low-int and high-int as 32-bit values, convert the values to IPv4 addresses, and associate the
given label identifier or label text with the IPv4 range.
Protocol/Port Mode
proto/port proto/port label-value
Associate the given label identifier or label text with all protocols and port numbers between these two
values inclusive. Note that while port is not meaningful for all protocols (specifically, it is meaningful
for TCP and UDP and may contain type/code information for ICMP), this file allows port numbers
to be given for any protocol.
proto proto label-value
Associate the given label identifier or label text with all protocols between these two values.
December 18, 2014
189
rwpmapbuild(1)
The SiLK Reference Guide
NOTES
The IP Address input file can contain nested CIDR blocks. They should be ordered with the more general
blocks first, and the more specific blocks last. That is, use:
10.0.0.0/8
10.1.0.0/16
10.1.2.0/24
My-network
Special-Subnet-1
Special-Subnet-2
Likewise, the protocol/port data can be nested:
6 6
6/0
6/22
6/25
6/80
6/1024
6/22
6/25
6/80
TCP
TCP/Generic reserved
TCP/SSH
TCP/SMTP
TCP/HTTP
EXAMPLE
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Reading and writing to a file:
$ echo "10.1.2.3/32 my favorite host" > fav.txt
$ rwpmapbuild -i fav.txt -o fav.pmap
Reading and writing to stdin and stdout:
$ echo "10.9.8.128/27 suspicious subnet" \
| rwpmapbuild --input-file=stdin --output-file=stdout > suspicious.pmap
Complex IP File
#
Numerical mappings of labels
label 0
label 1
label 2
#
Default to "external" for all un-defined ranges.
default
#
mode
190
non-routable
internal
external
external
Force IP-mode
ip
December 18, 2014
The SiLK Reference Guide
#
#
#
#
#
rwpmapbuild(1)
Create a name
This will add --pmap-src-network and --pmap-dst-network
switches to rwfilter, and src-network and dst-network
fields to rwcut, rwgroup, rwsort, rwstats, and rwuniq
map-name
network
## Reserved and non-routable blocks ###########################
#
#
#
#
#
Addresses in this block refer to source hosts on "this"
network. Address 0.0.0.0/32 may be used as a source
address for this host on this network; other addresses
within 0.0.0.0/8 may be used to refer to specified hosts
on this network [RFC1700, page 4].
0.0.0.0/8
#
#
#
This block is set aside for use in private networks. Its
intended use is documented in [RFC1918]. Addresses within
this block should not appear on the public Internet.
10.0.0.0/8
#
#
#
#
#
#
#
non-routable
This is the "link local" block. It is allocated for
communication between hosts on a single link. Hosts
obtain these addresses by auto-configuration, such as when
a DHCP server may not be found.
169.254.0.0/16
#
#
#
non-routable
This block is assigned for use as the Internet host
loopback address. A datagram sent by a higher level
protocol to an address anywhere within this block should
loop back inside the host. This is ordinarily
implemented using only 127.0.0.1/32 for loopback, but no
addresses within this block should ever appear on any
network anywhere [RFC1700, page 5].
127.0.0.0/8
#
#
#
#
non-routable
non-routable
This block is set aside for use in private networks. Its
intended use is documented in [RFC1918]. Addresses within
this block should not appear on the public Internet.
172.16.0.0/12
December 18, 2014
non-routable
191
rwpmapbuild(1)
#
#
#
#
#
This block is assigned as "TEST-NET" for use in
documentation and example code. It is often used in
conjunction with domain names example.com or example.net
in vendor and protocol documentation. Addresses within
this block should not appear on the public Internet.
192.0.2.0/24
#
#
#
non-routable
This block is set aside for use in private networks.
Its intended use is documented in [RFC1918]. Addresses
within this block should not appear on the public Internet.
192.168.0.0/16
#
#
#
#
#
#
The SiLK Reference Guide
non-routable
240.0.0.0/4 - This block, formerly known as the Class E
address space, is reserved. The "limited broadcast"
destination address 255.255.255.255 should never be
forwarded outside the (sub-)net of the source. The
remainder of this space is reserved for future use.
[RFC1700, page 4]
255.255.255.255/32
non-routable
# -- Below this line, would add any mappings appropriate to
# -- the local network.
Complex Protocol/Port File
#
Default to a hyphen ("-") for all un-defined ranges.
default
#
#
#
Force Protocol/Port-mode
This MUST be present, since IP mode is the default.
mode
#
proto-port
Protocol Overview
1 1
6 6
17 17
50 50
58 58
#
192
-
ICMP
TCP
UDP
ESP
ICMPv6
TCP -- Specific Ports
December 18, 2014
The SiLK Reference Guide
6/0 6/1024
6/21 6/21
6/22 6/22
6/25 6/25
6/53 6/53
6/80 6/80
6/6000 6/6063
#
TCP/Generic Reserved
TCP/ftp
TCP/ssh
TCP/smtp
TCP/dns
TCP/http
TCP/X11
UDP -- Specific Ports
17/53 17/53
#
#
#
#
#
rwpmapbuild(1)
UDP/dns
ICMP -- Specific Type/Code
To convert a type/code to a "port" value as stored in SiLK:
(type << 8) | code
OR
(type * 256) + code
so 3/3 (Destination Unreachable/Port Unreachable) becomes:
1/771 1/771
ICMP/Destination Unreachable/Port Unreachable
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SEE ALSO
pmapfilter(3), rwfilter(1), rwfileinfo(1), rwpmapcat(1), rwpmaplookup(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), silk(7)
December 18, 2014
193
rwpmapcat(1)
The SiLK Reference Guide
rwpmapcat
Print each range and label present in a prefix map file
SYNOPSIS
rwpmapcat [--output-type={mapname | type | ranges | labels}]
[--ignore-label=LABEL] [--ip-label-to-ignore=IP_ADDRESS]
[--left-justify-labels] [--no-cidr-blocks]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--no-titles] [--no-columns] [--column-separator=C]
[--no-final-delimiter] [{--delimited | --delimited=C}]
[--pager=PAGER_PROG]
[ { --map-file=PMAP_FILE | PMAP_FILE
| --address-types | --address-types=MAP_FILE
| --country-codes | --country-codes=MAP_FILE } ]
rwpmapcat --help
rwpmapcat --version
DESCRIPTION
rwpmapcat reads a prefix map file created by rwpmapbuild(1) or rwgeoip2ccmap(1) and prints its
contents.
By default, rwpmapcat prints the range/label pairs that exist in the prefix map. Use the --output-type
switch to print additional information or information other than the range/label pairs.
When printing the range/label pairs of a prefix map file that contain IP address data, rwpmapcat defaults
to printing the range as an address block in CIDR notation and the label associated with that block. To
print the ranges as a starting address and ending address, specify the --no-cidr-blocks switch.
If the prefix map file contains protocol/port pairs, rwpmapcat prints three fields: the starting protocol and
port separated by a slash (/), the ending protocol and port, and the label.
The printing of ranges having a specific label may be suppressed with the --ignore-label switch. To have
rwpmapcat to look up a label based on an IP address and then ignore all entries with that label, pass the
IP address to the --ip-label-to-ignore switch.
To print the contents of an arbirary prefix map file, one may pipe the file to rwpmapcat’s standard input,
name the file as the argument to the --map-file switch, or name the file on the command line.
To print the contents of the default country codes mapping file (see ccfilter(3)), specify the --countrycodes switch with no argument. To print the contents of a specific country codes mapping file, name that
file as the argument to the --country-codes switch.
For printing the address types mapping file (see addrtype(3)), use the --address-types switch which works
similarly to the --country-codes switch.
194
December 18, 2014
The SiLK Reference Guide
rwpmapcat(1)
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
Many of options are ignored unless rwpmapcat is printing the range/label pairs present in the prefix map
file.
--map-file=PMAP FILE
Specify the path of the prefix map file to print. If this switch is omitted and neither --country-codes
nor --address-types is specified, the name of the file to be read is taken as the first non-switch
command-line argument. If no argument is given, rwpmapcat attempts to read the map from the
standard input.
--address-types
Print the contents of the address types mapping file (addrtype(3)) specified by the
SILK ADDRESS TYPES environment variable, or in the default address types mapping file if that
environment variable is not set. This switch may not be conbined with the --map-file or --countrycodes switches.
--address-types=ADDRTYPE FILE
Print the contents of the address types mapping file specified by ADDRTYPE FILE.
--country-codes
Print the contents of the country code mapping file (ccfilter(3)) specified by the
SILK COUNTRY CODES environment variable, or in the default country code mapping file if
that environment variable is not set. This switch may not be conbined with the --map-file or
--address-types switches.
--country-codes=COUNTRY CODE FILE
Print the contents of the country code mapping file specified by COUNTRY CODE FILE.
--output-type={type | mapname | label | ranges}
Specify the type(s) of output to produce. When this switch is not provided, the default is to print
ranges. Specify multiple types as a comma separated list of names; regardless of the order in which
the types are given, the output will appear in the order shown below. Country-code prefix map files
only support the ranges output type. A type can be specified using the shortest unique prefix for the
type. The available types are:
type
Print the type of this prefix map file. The value will be one of IPv4-address, IPv6-address,
or proto-port. The type will be preceded by the string TYPE: and a space character unless
--no-titles is specified.
mapname
Print the name that is stored in the prefix map file. This mapname is used to generate switch
names and field names when this prefix map is used with rwfilter(1), rwcut(1), rwgroup(1),
rwsort(1), rwstats(1), and rwuniq(1). See pmapfilter(3) for details. The mapname will be
preceded by the string MAPNAME: and a space character unless --no-titles is specified.
December 18, 2014
195
rwpmapcat(1)
The SiLK Reference Guide
label
Print the names of the labels that exist in the prefix map file. The labels are printed left-justified,
one per line, with no delimiter. The labels will be preceded by LABELS: on its own line unless
--no-titles is specified. If ranges is also specified, a blank line will separate the labels and the
range/label columns.
ranges
Print the range and label for each block in the prefix map file. If the prefix map contains protocol/port pairs, the output will contain three columns (startPair, endPair, label), where startPair
and endPair contain protocol /port. If the prefix map contains IP addresses, the form of the output
will depend on whether --no-cidr-blocks is specified. When it is not specified, the output will
contain two columns (ipBlock, label), where ipBlock contains the IP range in CIDR notation. If
--no-cidr-blocks is specified, the output will contain three columns: startIP, endIP, label.
--ignore-label=LABEL
For the ranges output-type, do not print entries whose label is LABEL. By default, all entries in the
prefix map file are printed.
--ip-label-to-ignore=IP ADDRESS
For the ranges output-type, find the label associated with the IP address IP ADDRESS and ignore
all ranges that match that label. By default, all entries in the prefix map are printed.
--left-justify-labels
For the ranges output-type, left-justify the labels when columnar output is printed. Normally, the
labels are right-justified.
--no-cidr-blocks
Cause each IP address block to be printed as a starting and ending IP address. By default, IP addresses
are grouped into CIDR blocks. This switch is ignored for prefix map files containing protocol/port
pairs.
--ip-format=FORMAT
Specify how IP addresses will be printed. This switch is ignored for prefix map files containing protocol/port pairs. When this switch is not specified, IPs are printed in the canonical format. The
FORMAT is one of:
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively.
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
196
December 18, 2014
The SiLK Reference Guide
rwpmapcat(1)
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
December 18, 2014
197
rwpmapcat(1)
The SiLK Reference Guide
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
rwpmapbuild(1) creates the prefix map file sample.pmap from the textual input.
$ cat sample.txt
mode
ip
map-name addrtype
label
0 non-routable
label
1 internal
label
2 external
default
external
0.0.0.0/8 non-routable
10.0.0.0/8 non-routable
127.0.0.0/8 non-routable
169.254.0.0/16 non-routable
172.16.0.0/12 non-routable
192.0.2.0/24 non-routable
192.168.0.0/16 non-routable
255.255.255.255/32 non-routable
$ rwpmapbuild --input-file=sample.txt --output-file=sample.txt
Invoking rwpmapcat with the name of the file as its only argument prints the range-to-label contents of
the prefix map file, and the contents are printed as CIDR blocks if the file contains IP addresses.
$ rwpmapcat sample.pmap | head -10
ipBlock|
label|
0.0.0.0/8|non-routable|
1.0.0.0/8|
external|
2.0.0.0/7|
external|
4.0.0.0/6|
external|
8.0.0.0/7|
external|
10.0.0.0/8|non-routable|
11.0.0.0/8|
external|
12.0.0.0/6|
external|
16.0.0.0/4|
external|
Use the --no-cidr-blocks switch to print the range as a pair of IPs. The --map-file switch may be use to
specify the name of the file.
$ rwpmapcat --map-file=sample.pmap --no-cidr-block
startIP|
endIP|
label|
0.0.0.0| 0.255.255.255|non-routable|
1.0.0.0| 9.255.255.255|
external|
10.0.0.0| 10.255.255.255|non-routable|
11.0.0.0|126.255.255.255|
external|
127.0.0.0|127.255.255.255|non-routable|
128.0.0.0|169.253.255.255|
external|
198
December 18, 2014
The SiLK Reference Guide
rwpmapcat(1)
169.254.0.0|169.254.255.255|non-routable|
169.255.0.0| 172.15.255.255|
external|
172.16.0.0| 172.31.255.255|non-routable|
172.32.0.0|
192.0.1.255|
external|
192.0.2.0|
192.0.2.255|non-routable|
192.0.3.0|192.167.255.255|
external|
192.168.0.0|192.168.255.255|non-routable|
192.169.0.0|255.255.255.254|
external|
255.255.255.255|255.255.255.255|non-routable|
The --output-type switch determines what output is produced. Specifying an argument of label prints
the labels that were specified when the file was built.
$ rwpmapcat --map-file=sample.pmap --output-type=label
LABELS:
non-routable
internal
external
Multiple types of output may be requested
$ rwpmapcat --map-file=sample.pmap --output-type=type,mapname
TYPE: IPv4-address
MAPNAME: addrtype
Sometimes the content of the prefix map more clear if you eliminate the ranges that were assigned to the
default label. There are two ways to filter a label: either specify the label with the --ignore-label switch
or find an IP address that has that label and specify the IP address to the --ip-label-to-ignore switch:
$ cat sample.pmap | rwpmapcat --ignore-label=external
ipBlock|
label|
0.0.0.0/8|non-routable|
10.0.0.0/8|non-routable|
127.0.0.0/8|non-routable|
169.254.0.0/16|non-routable|
172.16.0.0/12|non-routable|
192.0.2.0/24|non-routable|
192.168.0.0/16|non-routable|
255.255.255.255/32|non-routable|
$ cat sample.pmap | rwpmapcat --ip-label-to-ignore=0.0.0.0 | head -7
ipBlock|
label|
1.0.0.0/8|
external|
2.0.0.0/7|
external|
4.0.0.0/6|
external|
8.0.0.0/7|
external|
11.0.0.0/8|
external|
12.0.0.0/6|
external|
rwpmapcat also supports viewing the contents of prefix map files containing protoocol/port pairs.
December 18, 2014
199
rwpmapcat(1)
$ rwpmapcat proto.pmap
startPair| endPair|
...
6/0|
6/0|
6/1|
6/1|
6/2|
6/3|
6/4|
6/4|
6/5|
6/5|
6/6|
6/6|
6/7|
6/7|
6/8|
6/8|
...
The SiLK Reference Guide
label|
TCP|
tcpmux|
compressnet|
TCP|
rje|
TCP|
echo|
TCP|
As of SiLK 3.8.0, rwpmapcat supports printing the contents of the country code mapping file created by
rwgeoip2ccmap(1) (for use in the country code plug-in ccfilter(3)) when the --country-codes switch is
used.
$ rwpmapcat --no-cidr --country-codes=country_codes.pmap | head
startIP|
endIP|label|
0.0.0.0|
2.6.190.55|
--|
2.6.190.56|
2.6.190.63|
gb|
2.6.190.64| 2.255.255.255|
--|
3.0.0.0|
4.17.135.31|
us|
4.17.135.32|
4.17.135.63|
ca|
4.17.135.64|
4.17.142.255|
us|
4.17.143.0|
4.17.143.15|
ca|
4.17.143.16|
4.18.32.71|
us|
4.18.32.72|
4.18.32.79|
mx|
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwpmapcat automatically invokes this program to display its output
a screen at a time. If set to an empty string, rwpmapcat does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwpmapcat automatically invokes this program to display its
output a screen at a time.
FILES
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country codes mapping file when the --country-codes switch is specified
without an argument.
200
December 18, 2014
The SiLK Reference Guide
rwpmapcat(1)
${SILK ADDRESS TYPES}
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file when the --address-types switch is specified
without an argument.
SEE ALSO
rwpmapbuild(1), rwgeoip2ccmap(1), pmapfilter(3), ccfilter(3), rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), silk(7)
NOTES
The --country-codes and --address-types switches were added in SiLK 3.8.0.
December 18, 2014
201
rwpmaplookup(1)
The SiLK Reference Guide
rwpmaplookup
Map keys to prefix map entries
SYNOPSIS
rwpmaplookup { --map-file=MAP_FILE | --address-types[=MAP_FILE]
| --country-codes[=MAP_FILE] }
[--fields=FIELDS] [--ipset-files] [--no-errors]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[{--output-path=PATH | --pager=PAGER_PROG}]
[--no-files ARG [ARGS...] | --xargs[=FILE] | FILE [FILES...]]
rwpmaplookup --help
rwpmaplookup --version
DESCRIPTION
rwpmaplookup finds keys in a binary prefix map file and prints the key and its value in a textual, bar (|)
delimited format.
By default, rwpmaplookup expects its arguments to be the names of text files containing keys---one key
per line. When the --ipset-files switch is given, rwpmaplookup takes IPset files as arguments and uses
the IPs as the keys. The --no-files switch causes rwpmaplookup to treat each command line argument
itself as a key to find in the prefix map.
When --no-files is not specified, rwpmaplookup reads the keys from the files named on the command line
or from the standard input when no file names are specified and neither --xargs nor --no-files is present.
To read the standard input in addition to the named files, use - or stdin as a file name. When the --xargs
switch is provided, rwpmaplookup will read the names of the files to process from the named text file,
or from the standard input if no file name argument is provided to the switch. The input to --xargs must
contain one file name per line.
You must tell rwpmaplookup the prefix map to use for look-ups using one of three switches:
• To use an arbitrary prefix map, use the --map-file switch.
• If you want to map IP addresses to country codes (see ccfilter(3)), use the --country-codes switch.
To use the default country code prefix map, do not provide an argument to the switch. To use a specific
country code mapping file, specify the file as the argument.
• If you want to map IP addresses to address types (see addrtype(3)), use the --address-types switch.
To use the default address types prefix map, do not provide an argument to the switch. To use a specific
address types mapping file, specify the file as the argument.
If the --map-file switch specifies a prefix map containing protocol/port pairs, each input file should contain
one protocol/port pair per line in the form PROTOCOL/PORT, where PROTOCOL is a number between
202
December 18, 2014
The SiLK Reference Guide
rwpmaplookup(1)
0 and 255 inclusive, and PORT is a number between 0 and 65535 inclusive. When the --ipset-files switch
is specified, it is an error if the --map-file switch specifies a prefix map containing protocol/port pairs.
When querying any other type of prefix map and the --ipset-files switch is not present, each textual input
file should contain one IP address per line, where the IP is a single IP address (not a CIDR block) in
canonical form or the integer representation of an IPv4 address.
The --fields switch allows you to specify which columns appear in the output. The default columns are the
key and the value, where the key is the IP address or protocol/port pair, and the value is the textual label
for that key.
If the prefix map contains IPv6 addresses, any IPv4 address in the input is mapped into the ::ffff:0:0/96
netblock when searching.
If the prefix map contains IPv4 addresses only, any IPv6 address in the ::ffff:0:0/96 netblock is converted
to IPv4 when searching. Any other IPv6 address is ignored, and it is not printed in the output unless the
input field is requested.
Prefix map files are created by the rwpmapbuild(1) and rwgeoip2ccmap(1) utilities. IPset files are
created most often by rwset(1) and rwsetbuild(1).
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
One of --map-file, --address-types, or --country-codes is required.
--map-file=PMAP FILE
Find the IP addresses or protocol/port pairs in the prefix map file PMAP FILE.
--address-types
Find the IP addresses in the address types (see addrtype(3)) mapping file specified by the
SILK ADDRESS TYPES environment variable, or in the default address types mapping file if that
environment variable is not set.
--address-types=ADDRTYPE FILE
Find the IP addresses in the address types mapping file specified by ADDRTYPE FILE.
--country-codes
Find the IP addresses in the country code (see ccfilter(3)) mapping file specified by the
SILK COUNTRY CODES environment variable, or in the default country code mapping file if that
environment variable is not set.
--country-codes=COUNTRY CODE FILE
Find the IP addresses in the country code mapping file specified by COUNTRY CODE FILE.
--fields=FIELDS
Specify the columns to include in the output. The columns will be displayed in the order the fields are
specified. FIELDS is a comma separated list of field-names. Field-names are case-insensitive. When
this switch is not provided, the default fields are key,value. The list of available fields are:
December 18, 2014
203
rwpmaplookup(1)
The SiLK Reference Guide
key
The key used to search the prefix map.
value
The label returned from the prefix map for the key.
block
The block in the prefix map that contains the key. For a prefix map file that contains IPv4
addresses, the result will be a CIDR block such as 10.18.26.32/27.
start-block
The value at the start of the block in the prefix map that contains the key.
end-block
The value at the end of the block in the prefix map that contains the key.
input
The text read from the input file that rwpmaplookup attempted to parse. Note that blank lines,
lines containing only whitespace and comments, and lines longer than 2048 characters will not be
printed. In addition, any comments appearing after the text are stripped. When --ipset-files is
specified, this field contains the IP address in its canonical form.
--no-files
Causes rwpmaplookup to treat the command line arguments as the text to be parsed. This allows
one to look up a handful of values without having to create a temporary file. Use of the --no-files
switch disables paging of the output. This switch may not be combined with --ipset-files.
--no-errors
Disables printing of errors when the input cannot be parsed as an IP address or a protocol/port pair.
This switch is ignored when --ipset-files is specified.
--ipset-files
Causes rwpmaplookup to treat the command line arguments as the names of IPset files to read
and use as keys into the prefix map. It is an error to use this switch when --map-file specifies a
protocol/port prefix map. When --ipset-files is active, the input column of --fields contains the IP
in its canonical form, regardless of the --ip-format switch. This switch may not be combined with
--no-files.
--ip-format=FORMAT
When printing the key of an prefix map containing IP addresses, specify how the IP addresses will be
printed. When this switch is not specified, IPs are printed in the canonical format. The FORMAT is
one of:
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively.
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
204
December 18, 2014
The SiLK Reference Guide
rwpmaplookup(1)
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--output-path=PATH
Determines where the output of rwpmaplookup is written. If this option is not given, output is
written to the standard output.
--pager=PAGER PROG
When the --no-files switch has not been specified and output is to a terminal, invoke the program
PAGER PROG to view the output one screen full at a time. This switch overrides the SILK PAGER
environment variable, which in turn overrides the PAGER variable. If the value of the pager is determined to be the empty string, no paging will be performed and all output will be printed to the
terminal.
--xargs
December 18, 2014
205
rwpmaplookup(1)
The SiLK Reference Guide
--xargs=FILENAME
Causes rwpmaplookup to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwpmaplookup will open each file in
turn and read the IPset, textual IP addresses, or textual protocol/port pairs from it, as if the files had
been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Country code examples
Print the country code for a list of addresses read from the standard input.
$ cat my-addrs.txt
128.2.0.0
128.2.0.1
$ cat my-addrs.txt | rwpmaplookup --country-codes
key|
value|
128.2.0.0|
us|
128.2.0.1|
us|
Use --no-files to list the address on the command line.
$ rwpmaplookup --country-codes 128.2.0.0 128.2.0.1
key|
value|
128.2.0.0|
us|
128.2.0.1|
us|
Use --ipset-files to read the addresses from an IPset file.
$ rwsetbuild my-addrs.txt my-addrs.set
$ rwpmaplookup --country-codes --ipset-files my-addrs.set
key|
value|
128.2.0.0|
us|
128.2.0.1|
us|
Use the --fields switch to control which columns are printed.
206
December 18, 2014
The SiLK Reference Guide
rwpmaplookup(1)
$ rwpmaplookup --country-codes --fields=value my-addrs.txt
value|
us|
us|
Add the --delimited and --no-titles switches so the output only contains the value column. Print the
country code for a single address using the default country code prefix map.
$ rwpmaplookup --country-codes --fields=value --delimited \
--no-titles --no-files 128.2.0.0
us
Alternatively
$ echo 128.2.0.0
\
| rwpmaplookup --country-codes --fields=value --delim --no-title
us
To use a different country code mapping file, provide that file as the argument to the --country-codes
switch.
$ rwpmaplookup --country-code=old-address-map.pmap --no-files 128.2.0.0
key|value|
128.2.0.0|
us|
CIDR block input
Note that rwpmaplookup does not parse text that contains CIDR blocks.
$ echo ’128.2.0.0/31’
\
| rwpmaplookup --country-codes
key|value|
rwpmaplookup: Invalid IP ’128.2.0.1/31’ at -:1: Extra text follows value
For this case, use the IPset tool rwsetbuild(1) to parse the CIDR block list and create a binary IPset
stream, and pipe the IPset to rwpmaplookup.
$ echo ’128.2.0.0/31’
\
| rwsetbuild
\
| rwpmaplookup --country-code --ipset-files
key|value|
128.2.0.0|
--|
128.2.0.1|
--|
For versions of rwpmaplookup that do not have the --ipset-files switch, you can have rwsetcat(1) read
the binary IPset stream and print the IP addresses as text, and pipe that into rwpmaplookup. Be sure to
include the --cidr-blocks=0 switch to rwsetcat which forces individual IP addresses to be printed.
December 18, 2014
207
rwpmaplookup(1)
$ echo ’128.2.0.0/31’
| rwsetbuild
| rwsetcat --cidr-blocks=0
| rwpmaplookup --country-code
key|value|
128.2.0.0|
--|
128.2.0.1|
--|
The SiLK Reference Guide
\
\
\
General prefix map usage
Consider a user-defined prefix map, assigned-slash-8s.pmap, that maps each /8 in the IPv4 address space to
its assignment.
$ rwpmapcat assigned-slash-8s.pmap | head -4
ipBlock|
label|
0.0.0.0/8|
IANA - Local Identification|
1.0.0.0/8|
APNIC|
2.0.0.0/8|
RIPE NCC|
Use the --map-file switch to map from IPs to labels using this prefix map.
$ cat my-addrs.txt
17.17.17.17
9.9.9.9
$ cat my-addrs.txt | rwpmaplookup --map-file=assigned-slash-8s.pmap
key|
value|
17.17.17.17| Apple Computer Inc.|
9.9.9.9|
IBM|
Use --ip-format=decimal to print the output as integers.
$ cat my-addrs.txt
\
| rwpmaplookup --ip-format=decimal --map-file=assigned-slash-8s.pmap
key|
value|
286331153| Apple Computer Inc.|
151587081|
IBM|
Add the input field to see the input as well.
$ cat my-addrs.txt
\
| rwpmaplookup --ip-format=decimal --fields=key,value,input \
--map-file=assigned-slash-8s.pmap
key|
value|
input|
286331153| Apple Computer Inc.|
17.17.17.17|
151587081|
IBM|
9.9.9.9|
Combine the input field with the --no-errors switch to see a row for each key.
208
December 18, 2014
The SiLK Reference Guide
rwpmaplookup(1)
$ rwpmaplookup --fields=key,value,input --no-errors --no-files \
--map-file=assigned-slash-8s.pmap 9.9.9.9 17.1717.17
key|
value|
input|
9.9.9.9| Apple Computer Inc.|
9.9.9.9|
|
|
17.1717.17|
The input can contain integer values.
$ echo 151587081
\
| rwpmaplookup --fields=key,value,input --delimited=, \
--map-file=assigned-slash-8s.pmap
key,value,input
9.9.9.9,IBM,151587081
Block output
Specifying block in the --fields switch causes rwpmaplookup to print the CIDR block that contains the
address key.
$ cat my-addrs.txt
9.8.7.6
9.10.11.12
17.16.15.14
17.18.19.20
$ rwpmaplookup --map-file=assigned-slash-8s.pmap \
--fields=key,value,block my-addrs.txt
key|
value|
block|
9.8.7.6|
IBM|
9.0.0.0/8|
9.10.11.12|
IBM|
9.0.0.0/8|
17.16.15.14| Apple Computer Inc.|
17.0.0.0/8|
17.18.19.20| Apple Computer Inc.|
17.0.0.0/8|
To break the CIDR block into its starting and ending value, specify the start-block and end-block fields.
$ rwpmaplookup --map-file=assigned-slash-8s.pmap
\
--fields=key,value,start-block,end-block my-addrs.txt
key|
value|
start-block|
end-block|
9.8.7.6|
IBM|
9.0.0.0| 9.255.255.255|
9.10.11.12|
IBM|
9.0.0.0| 9.255.255.255|
17.16.15.14| Apple Computer Inc.|
17.0.0.0| 17.255.255.255|
17.18.19.20| Apple Computer Inc.|
17.0.0.0| 17.255.255.255|
To get a unique list of blocks for the input keys, do not output the key field and pipe the output of
rwpmaplookup to the uniq(1) command. (This works as long as the input data is sorted).
$ cat my-addrs.txt
\
| rwpmaplookup --map-file=assigned-slash-8s.pmap \
--fields=block,value
\
| uniq
December 18, 2014
209
rwpmaplookup(1)
The SiLK Reference Guide
block|
value|
9.0.0.0/8|
IBM|
17.0.0.0/8| Apple Computer Inc.|
The values printed in the block column corresponds to the CIDR block that were used when the prefix map
file was created.
$ rwpmaplookup --map=assigned-slash-8s.pmap --fields=block,value
--no-files 128.2.0.1 129.0.0.1
block|
value|
128.0.0.0/8|Administered by ARIN|
129.0.0.0/8|Administered by ARIN|
\
In the output from rwpmapcat(1), those two blocks are combined into a larger range.
$ rwpmapcat --map=assigned-slash-8s.pmap | grep 128
128.0.0.0/6|Administered by ARIN|
Working with IPsets
Assume you have a binary IPset file, my-ips.set, that has the contents shown here, and you want to find the
list of unique assignments from the assigned-slash-8s.pmap file.
$ rwsetcat --cidr-blocks=1 my-ips.set
9.9.9.0/24
13.13.13.0/24
15.15.15.0/24
16.16.16.0/24
17.17.17.0/24
18.18.18.0/24
Since the blocks in the assigned-slash-8s.pmap file are /8, use the rwsettool(1) command to mask the IPs
in the IPset to the unique /8 that contains each of the IPs.
$ rwsettool --mask=8 my-ips.set
\
| rwpmaplookup --map-file=assigned-slash-8s.pmap
key|
value|
9.0.0.0|
IBM|
13.0.0.0|
Xerox Corporation|
15.0.0.0|
Hewlett-Packard Company|
16.0.0.0|Digital Equipment Corporation|
17.0.0.0|
Apple Computer Inc.|
18.0.0.0|
MIT|
Protocol/port prefix maps
Assume the service.pmap prefix map file maps protocol/port pairs to the name of the service running on
the named port.
210
December 18, 2014
The SiLK Reference Guide
$ rwpmapcat
startPair|
0/0|
1/0|
2/0|
6/0|
6/22|
...
17/0|
17/53|
...
rwpmaplookup(1)
service.pmap
endPair|
label|
0/65535| unknown|
1/65535|
ICMP|
5/65535| unknown|
6/21|
TCP|
6/22| TCP/SSH|
17/52|
17/53|
UDP|
UDP/DNS|
To query this prefix map, the input must contain two numbers separated by a slash.
$ rwpmaplookup --map-file=service.pmap --no-files 6/80
key|
value|
6/80| TCP/HTTP|
Specifying block, start-block, and end-block in the --fields switch also works for Protocol/port prefix
map files. The block column contains the same information as the start-block and end-block columns
separated by a single space.
$ rwpmaplookup --map-file=service.pmap --no-files \
--fields=key,value,start,end,block
\
6/80 6/6000 17/0 17/53 128/128
key|
value|start-blo|end-block|
block|
6/80| TCP/HTTP|
6/80|
6/80|
6/80 6/80|
6/6000|
TCP|
6/4096|
6/6143|
6/4096 6/6143|
17/0|
UDP|
17/0|
17/31|
17/0 17/31|
17/53|
UDP/DNS|
17/53|
17/53|
17/53 17/53|
200/200|Unassigned|
192/0|223/65535|
192/0 223/65535|
Using the pmapfilter(3) plug-in to rwcut(1), you can print the label for the source port and destination
port in the SiLK Flow file data.rw.
$ rwcut --pmap-file=service.pmap --num-rec=5
\
--fields=proto,sport,src-service,dport,dst-service data.rw
pro|sPort|src-service|dPort|dst-service|
17|29617|
UDP|
53|
UDP/DNS|
17|
53|
UDP/DNS|29617|
UDP|
6|29618|
TCP|
22|
TCP/SSH|
6|
22|
TCP/SSH|29618|
TCP|
1|
0|
ICMP| 771|
ICMP|
The pmapfilter plug-in does not provide a way to print the values based on the application field. You can
get that information by having rwcut print the protocol and application separated by a slash, and pipe the
result into rwpmaplookup.
$ rwcut --fields=proto,application --num-rec=5
--delimited=/ --no-title
December 18, 2014
\
\
211
rwpmaplookup(1)
The SiLK Reference Guide
| rwpmaplookup --map-file=service.pmap
key|
value|
17/53| UDP/DNS|
17/53| UDP/DNS|
6/22| TCP/SSH|
6/22| TCP/SSH|
1/0|
ICMP|
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwpmaplookup automatically invokes this program to display its
output a screen at a time unless the --no-files switch is given. If this variable is set to an empty string,
rwpmaplookup does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwpmaplookup automatically invokes this program to display
its output a screen at a time.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file to use when the
--country-codes switch is specified without an argument. The variable’s value may be a complete
path or a file relative to SILK PATH. See the FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file to use when the
--address-types switch is specified without an argument. The variable’s value may be a complete
path or a file relative to the SILK PATH. See the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwpmaplookup may use this environment variable. See the FILES section for details.
FILES
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country codes mapping file when the --country-codes switch is specified
without an argument.
${SILK ADDRESS TYPES}
212
December 18, 2014
The SiLK Reference Guide
rwpmaplookup(1)
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file when the --address-types switch is specified
without an argument.
NOTES
rwpmaplookup was added in SiLK 3.0.
rwpmaplookup duplicates the functionality of rwip2cc(1). rwip2cc is deprecated, and it will be removed
in the SiLK 4.0 release. Examples of using rwpmaplookup in place of rwip2cc are provided in the latter’s
manual page.
SEE ALSO
rwpmapbuild(1), rwpmapcat(1), ccfilter(3), addrtype(3), pmapfilter(3), rwgeoip2ccmap(1), rwcut(1), rwset(1), rwsetbuild(1), rwsetcat(1), rwsettool(1), silk(7)
December 18, 2014
213
rwpmatch(1)
The SiLK Reference Guide
rwpmatch
Filter a tcpdump file using a SiLK Flow file
SYNOPSIS
rwpmatch --flow-file=FLOW_FILE [--msec-compare] [--ports-compare]
TCPDUMP_INPUT > TCPDUMP_OUTPUT
rwpmatch --help
rwpmatch --version
DESCRIPTION
rwpmatch reads each packet from the pcap(3) (tcpdump(1)) capture file TCPDUMP INPUT and writes
the packet to the standard output if the specified FLOW FILE contains a matching SiLK Flow record. It
is designed to reverse the input from rwptoflow(1).
rwpmatch will read the pcap capture data from its standard input if TCPDUMP INPUT is specified as
stdin. The application will fail when attempting to read or write binary data from or to a terminal.
The SiLK Flow records in FLOW FILE should appear in time sorted order.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--flow-file=FLOW FILE
FLOW FILE refers to a file, named pipe, or the string stdin. The flow file determines which packet
records should be output to the new packet file. This switch is required.
--msec-compare
Compare times down to the millisecond (rather than the default of second).
--ports-compare
For TCP and UDP data, compare the source and destination ports when matching.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
214
December 18, 2014
The SiLK Reference Guide
rwpmatch(1)
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Given the pcap capture file data.pcap, convert it to a SiLK flow file:
$ rwptoflow data.pcap --packet-pass=good.pcap --flow-out=data.rw
Filter the SiLK flows---passing those records whose source IPs are found in the IPset file sip.set:
$ rwfilter --sipset=sip.set --pass=filtered.rw
data.rw
Match the original pcap file against the filtered SiLK file, in effect generating a pcap file which has been
filtered by sip.set:
$ rwpmatch --flow-file=filtered.rw good.pcap > filtered.pcap
NOTES
For best results, the tcpdump input to rwpmatch should be the output from --packet-pass-output
switch on rwptoflow. This ensures that only well-behaved packets are given to rwpmatch.
The flow file input to rwpmatch should contain single-packet flows originally derived from a tcpdump
file using rwptoflow. If a flow record is found which does not represent a corresponding tcpdump record,
rwpmatch will return an error.
Both the tcpdump and the SiLK file inputs must be time-ordered.
rwpmatch is an expensive I/O application since it reads the entire tcpdump capture file and the entire
SiLK Flow file. It may be worthwhile to optimize an analysis process to avoid using rwpmatch until payload
filtering is necessary. Saving the output from rwpmatch as a partial-results file, and matching against that
in the future (rather than the original tcpdump file) can also provide significant performance gains.
SiLK supports millisecond timestamps. When reading packets whose timestamps have finer precision, the
times are truncated at the millisecond position.
SEE ALSO
rwptoflow(1), rwfilter(1), silk(7), tcpdump(1), pcap(3)
December 18, 2014
215
rwptoflow(1)
The SiLK Reference Guide
rwptoflow
Generate SiLK Flow records from packet data
SYNOPSIS
rwptoflow [--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--active-time=YYYY/MM/DD:hh:dd:mm:ss.uuuuuu-YYYY/MM/DD:hh:dd:mm:ss.uuuuuu]
[--flow-output=FLOW_PATH] [--packet-pass-output=PCKTS_PASS]
[--packet-reject-output=PCKTS_REJECT]
[--reject-all-fragments] [--reject-nonzero-fragments]
[--reject-incomplete] [--set-sensorid=SCALAR]
[--set-inputindex=SCALAR] [--set-outputindex=SCALAR]
[--set-nexthopip=IP_ADDRESS] [--print-statistics]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] TCPDUMP_INPUT
rwptoflow [--plugin=PLUGIN ...] --help
rwptoflow --version
DESCRIPTION
rwptoflow attempts to generate a SiLK Flow record for every Ethernet IP IPv4 packet in the pcap(3)
(tcpdump(1)) capture file TCPDUMP INPUT. TCPDUMP INPUT must contain data captured from an
Ethernet datalink.
rwptoflow does not attempt to reassemble fragmented packets or to combine multiple packets into a single
flow record. rwptoflow is a simple program that creates one SiLK Flow record for every IPv4 packet in
TCPDUMP INPUT. (For an alternate approach, consider using the rwp2yaf2silk(1) tool as described at
the end of this section.)
rwptoflow will read from its standard input if TCPDUMP INPUT is specified as stdin. The SiLK Flow
records are written to the specified flow-output file or to the standard output. The application will fail
when attempting to read or write binary data from or to a terminal.
Packets outside of a user-specified active-time window can be ignored. Additional filtering on the TCPDUMP INPUT can be performed by using tcpdump with an expression filter and piping tcpdump’s
output into rwptoflow.
In addition to generating flow records, rwptoflow can write pcap files containing the packets that it used to
generate each flow, and/or the packets that were rejected. Note that packets falling outside the active-time
window are ignored and are not written to the packet-reject-output.
Statistics of the number of packets read, rejected, and written can be printed.
rwptoflow will reject any packet that is not an IPv4 Ethernet packet and any packet that is too short to
contain the Ethernet and IP headers. At the user’s request, packets may be rejected when
• they are fragmented---either the initial (zero-offset) fragment or a subsequent fragment
• they have a non-zero fragment offset
216
December 18, 2014
The SiLK Reference Guide
rwptoflow(1)
• they are not fragmented or they are the zero-fragment but the capture file does not contain enough
information about the packet to set protocol-specific information---namely the ICMP type and code,
the UDP source and destination ports, or the TCP source and destination ports and flags
Since the input packet formats do not contain some fields normally found in NetFlow data, rwptoflow
provides a way to set those flow values in all packets. For example, it is possible to set the sensor-id
manually for a tcpdump source, so that flow data can be filtered or sorted by that value later.
Alternative to rwptoflow
As mentioned above, rwptoflow is a simple program for processing Ethernet IP IPv4 packets. rwptoflow
does not:
• reassemble fragmented packets
• support IPv6 packets
• combine multiple packets into a single flow record
• support any decoding of packets (e.g., 802.1q)
For these features (and others), you should use the yaf(1) application (http://tools.netsa.cert.org/yaf/) to
read the pcap file and generate an IPFIX stream, and pipe the IPFIX stream into rwipfix2silk(1) to convert
it to SiLK Flow records.
The rwp2yaf2silk(1) script makes this common usage more convenient by wrapping the invocation of yaf
and rwipfix2silk. You give rwp2yaf2silk a pcap file and it writes SiLK Flow records.
By default, rwptoflow creates a flow record for every packet, fragments and all. You can almost force yaf
to create a flow record for every packet: When you give yaf the --idle-timeout=0 switch, yaf creates
a flow record for every complete packet and for each packet that it is able to completely reassemble from
packet fragments. Any fragmented packets that yaf cannot reassemble are dropped.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--plugin=PLUGIN
Use the specified plug-in to ignore or reject packets or to modify the flow record that is generated
from the packet. The switch may be repeated to load multiple plug-ins. See the PLUG-IN SUPPORT
section below for details.
--active-time=YYYY/MM/DD[:hh[:dd[:mm[:ss[.uuuuuu]]]]]
--active-time=YYYY/MM/DD[:hh[:dd[:mm[:ss[.uuuuuu]]]]]-YYYY/MM/DD[:hh[:dd[:mm[:ss[.uuuuuu]]]]]
Ignore all packets whose time falls outside the specified range. The times must be specified to at least
day precision. The start time is required; when the end-time is not present, it is treated as infinite. The
end-time will be rounded-up to instant before the next time unit; i.e., an end-time of 2006/08/31:15 is
treated as 2006/08/31:15:59:59.999999.
December 18, 2014
217
rwptoflow(1)
The SiLK Reference Guide
--flow-output=FLOW PATH
Write the generated SiLK Flow records to the specified file at FLOW PATH. When this switch is not
provided, the flows are written to the standard output.
--packet-pass-output=PCKTS PASS
For each generated SiLK Flow record, write the packet that generated the flow to the pcap file specified
by PCKTS PASS. Use stdout to write the packets to the standard output.
--packet-reject-output=PCKTS REJECT
Write each packet that occurs within the active-time window but for which a SiLK Flow record was
not generated to the pcap file specified by PCKTS REJECT. Use stdout to write the packets to the
standard output.
The packets that get written to this file may include packets that were shorter than that required to
get the IP header, non-IPv4 packets, and packets that get treated as reject packets by the following
switches.
--reject-all-fragments
Do not generate a SiLK Flow record for the packet when the packet is fragmented. This includes the
initial (zero-offset) fragment and all subsequent fragments. If --packet-reject-output is specified,
the packet will be written to that file.
--reject-nonzero-fragments
Do not generate a SiLK Flow record for the packet when the packet is fragmented unless this is the
initial fragment. That is, reject all packets that have a non-zero fragmentation offset. Normally flow
records are generated for these packets, but the ports and TCP flag information is set to zero. If
--packet-reject-output is specified, the packet will be written to that file.
--reject-incomplete
Do not generate a SiLK Flow record for the packet when the packet’s fragmentation-offset is zero yet
the packet does not contain enough information to completely specify an ICMP, UDP, or TCP record
(that is, the packet is too short to set the ICMP type and code, the UDP or TCP source or destination
port, or the TCP flags). Normally, flow records are generated for these packets but the ports and TCP
flag information is set to zero. This switch has no effect on packets where the protocol is not 1,6, or
17.
This switch does not imply --reject-nonzero-fragments; to indicate that all generated flow records
must have valid port and TCP flag information, specify --reject-nonzero-fragments --rejectincomplete.
--set-sensorid=SCALAR
Set the sensor ID for all flows to SCALAR. SCALAR should be an integer value between 0 and 65534,
inclusive. When not specified, the sensor ID is set to 65535.
--set-inputindex=SCALAR
Set the input SNMP index value for all flows to SCALAR. SCALAR should be an integer value between
0 and 65535, inclusive. When not specified, the SNMP input is set to 0.
--set-outputindex=SCALAR
Set the output SNMP index value for all flows to SCALAR. SCALAR should be an integer value
between 0 and 65535, inclusive. When not specified, the SNMP output is set to 0.
218
December 18, 2014
The SiLK Reference Guide
rwptoflow(1)
--set-nexthopip=IP ADDRESS
Set the next-hop IP address for all flows to IP ADDRESS ; IP ADDRESS may be in its canonical form
or an integer. When not specified, the next-hop IP is set to 0.0.0.0.
--print-statistics
Print a summary of the packets that were processed. This summary includes
• the total number of packets read
• the number that fell outside the time-window
• the number that were too short to get the IP header
• the number that were not IPv4
• the number that were discarded by a plug-in
• the total number of fragmented packets
• the number of fragments where the offset was zero
• the number of zero-offset packets that were incomplete
• the number of flows written to the output
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
December 18, 2014
219
rwptoflow(1)
The SiLK Reference Guide
--help
Print the available options and exit. Options that add fields can be specified before --help so that the
new options appear in the output.
--version
Print the version number and information about how SiLK was configured, then exit the application.
PLUG-IN SUPPORT
rwptoflow allows the user to provide additional logic to ignore or reject packets, or to modify the flow
record that is generated from the packet. To do this, the user creates a plug-in that gets loaded at run-time
by giving rwptoflow the --plugin switch with the path to the plug-in as the parameter to the switch.
A plug-in is a shared object file (a.k.a. dynamic library) that is compiled from C source code. The plug-in
should have four subroutines defined:
setup()
is called when the object is first loaded. This is the place to initialize global variables to their default
values. If the plug-in provides switches of its own, they must be registered in this subroutine.
initialize()
gets called after all options have been processed but before any packets are read from the input. If this
subroutine does not return 0, the application will quit.
ptoflow()
will be called for every packet that rwptoflow is able to convert into a flow record just before the flow
record is written. This subroutine will not see packets that are short or that are not IPv4; it will also
not see fragmented packets if --reject-all-fragments is specified.
The ptoflow() function is called with two parameters:
• a pointer to the rwRec object that rwptoflow created from the packet. The subroutine may
modify the record as it sees fit.
• a void pointer that the function may cast to a pointer to the C structure:
typedef struct _sk_pktsrc_t {
/* the source of the packets */
pcap_t
*pcap_src;
/* the pcap header as returned from pcap_next() */
const struct pcap_pkthdr
*pcap_hdr;
/* the packet as returned from pcap_next() */
const u_char
*pcap_data;
} sk_pktsrc_t;
This structure gives the user access to all the information about the packet.
The following return values from ptoflow() determines whether rwptoflow writes the flow and the
packet:
0
Write the flow record to the flow-output and the packet to the PCKTS PASS unless another
plug-in instructs otherwise.
220
December 18, 2014
The SiLK Reference Guide
rwptoflow(1)
1
Write the flow record to the flow-output and the packet to the PCKTS PASS immediately; do
not call the ptoflow() routine on any other plug-in.
2
Treat the packet as a reject: Do not write the flow record; write the packet to the
PCKTS REJECT immediately; do not call the ptoflow() routine on any other plug-in.
3
Ignore the packet immediately: Do not write the flow record nor the packet; do not call the
ptoflow() routine on any other plug-in.
If ptoflow() returns any other value, the rwptoflow application will terminate with an error.
teardown()
is called as the application exits. The user can use this routine to print results and to free() any data
structures that were used.
rwptoflow uses the following rules to find the plug-in: When PLUGIN contains a slash (/), rwptoflow assumes the path to PLUGIN is correct. Otherwise, rwptoflow will attempt to find the file
in $SILK PATH/lib/silk, $SILK PATH/share/lib, $SILK PATH/lib, and in these directories parallel to the
application’s directory: lib/silk, share/lib, and lib. If rwptoflow does not find the file, it assumes the
plug-in is in the current directory. To force rwptoflow to look in the current directory first, specify -plugin=./PLUGIN. When the SILK PLUGIN DEBUG environment variable is non-empty, rwptoflow
prints status messages to the standard error as it tries to open each of its plug-ins.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Given the packet capture file data.pcap, convert it to a SiLK flow file, data.rw, and copy the packets that
rwptoflow understands to the file good.pcap:
$ rwptoflow data.pcap --packet-pass=good.pcap --flow-out=data.rw
Use rwfilter to partition the SiLK Flows records, writing those records whose source IPs are found in the
IPset file sip.set to filtered.rw :
$ rwfilter --sipset=sip.set --pass=filtered.rw
data.rw
Match the capture file, good.pcap, against the filtered SiLK file, in affect generating a capture file which has
been filtered by sip.set:
$ rwpmatch --flow-file=filtered.rw good.pcap > filtered.pcap
ENVIRONMENT
SILK PLUGIN DEBUG
When set to 1, rwptoflow print status messages to the standard error as it tries to open each of its
plug-ins.
December 18, 2014
221
rwptoflow(1)
The SiLK Reference Guide
SEE ALSO
rwpmatch(1), rwpdedupe(1), rwfileinfo(1), silk(7), rwp2yaf2silk(1), rwipfix2silk(1), yaf(1), tcpdump(1), pcap(3), mergecap(1), zlib(3)
NOTES
SiLK supports millisecond timestamps. When reading packets whose timestamps have finer precision, the
times are truncated at the millisecond position.
The mergecap(1) or rwpdedupe(1) programs can be used to join multiple tcpdump capture files in
order to convert into a single flow file.
222
December 18, 2014
The SiLK Reference Guide
rwrandomizeip(1)
rwrandomizeip
Randomize the IP addresses in a SiLK Flow file
SYNOPSIS
rwrandomizeip [--seed=NUMBER] [--only-change-set=CHANGE_IPSET]
[--dont-change-set=KEEP_IPSET]
[--consistent] [--save-table=FILE] [--load-table=FILE]
[--site-config-file=FILENAME] INPUT_FILE OUTPUT_FILE
rwrandomizeip --help
rwrandomizeip --version
DESCRIPTION
Substitute a pseudo-random IP address for the source and destination IP addresses of INPUT FILE and
write the result to OUTPUT FILE. You may use stdin for INPUT FILE to have rwrandomizeip to
read from the standard input; the OUTPUT FILE value of stdout will cause rwrandomizeip to write to
the standard output unless it is connected to a terminal. rwrandomizeip knows how to read and write
compressed (gzippid) files.
To only change a subset of the IP addresses, the optional switches --only-change-set or --dont-changeset can be used; each switch takes an IPset file as its required argument. When the --only-changeset=CHANGE IPSET switch is given, rwrandomizeip only modifies the IP addresses listed in the
CHANGE IPSET file. To change all addresses except a specified set, use rwsetbuild(1) to create an
IPset file containing those IPs and pass the name of the file to the --dont-change-set switch. An address
listed in both the only-change-set and the dont-change-set will not be modified.
The --seed switch can be used to initialize the pseudo-random number generator to a known state.
When the --consistent, --load-table, and --save-table switches are not provided, rwrandomizeip uses
a pseudo-random, non-routable IP address for each source and destination IP address it sees; an IP address
that appears multiple times in the input will be mapped to different output address each time, and no
structural information in the input will be maintained.
The --consistent, --load-table, or --save-table switches enable consistent IP mapping, so that an input
IP is consistently mapped to the same output IP. In addition, the structural information of the input IPs is
maintained. Unfortunately, this comes at a cost of less randomness in the output.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--seed=NUMBER
Use NUMBER to seed the pseudo-random number generator. This can be used to put the random
number generator into a known state, which is useful for testing.
December 18, 2014
223
rwrandomizeip(1)
The SiLK Reference Guide
--only-change-set=CHANGE IPSET
Only modify the source or destination IP address if it appears in the given IPset file CHANGE IPSET.
The rwsetbuild command can be used to create an IPset file. When the --dont-changeset=KEEP IPSET switch is also given, the IPs it contains will override those in the CHANGE IPSET
file.
--dont-change-set=KEEP IPSET
Do not modify the source or destination IP address if the address appears in the given IPset file
KEEP IPSET. The rwsetbuild command can be used to create an IPset file. The interaction of this
switch with the --only-change-set switch is described immediately above.
--consistent
Randomize the IP addresses consistently, so that an input IP address is always mapped to the same
value. The default behavior is to use a random IP address for each IP, even if the IP has been seen
before.
--save-table=FILE
Randomize the IP addresses consistently and save this run’s randomization table for future use. The
table is written to the specified FILE, which must not not exist. This switch is incompatible with the
--load-table switch.
--load-table=FILE
Randomize the IP addresses consistently using the randomization table contained in FILE that was
created by a previous invocation of rwrandomizeip. This switch is incompatible with the --savetable switch.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
the location specified by the SILK CONFIG FILE environment variable is used if that variable is
not empty. The value of SILK CONFIG FILE should include the name of the file. Otherwise, the
application looks for a file named silk.conf in the following directories: the directory specified in the
SILK DATA ROOTDIR environment variable; the data root directory that is compiled into SiLK
(/data); the directories $SILK PATH/share/silk/ and $SILK PATH/share/ ; and the share/silk/ and
share/ directories parallel to the application’s directory.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
ENVIRONMENT
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
When the --site-config-file switch is not provided and the SILK CONFIG FILE environment variable
is not set, rwrandomizeip looks for the site configuration file in $SILK DATA ROOTDIR/silk.conf.
224
December 18, 2014
The SiLK Reference Guide
rwrandomizeip(1)
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK PATH
This environment variable gives the root of the install tree. As part of its search for the
SiLK site configuration file, rwrandomizeip checks for a file named silk.conf in the directories
$SILK PATH/share/silk and $SILK PATH/share.
SEE ALSO
rwsetbuild(1), silk(7)
BUGS
rwrandomizeip does not support IPv6 flow records. When an input file contains IPv6 records, rwrandomizeip converts records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and processes them.
rwrandomizeip silently ignores IPv6 records containing addresses outside of that prefix.
Only the source and destination IP fields are modified; additional fields in the SiLK Flow records may leak
sensitive information.
The --consistent switch uses a method of randomization that is fairly easy to decipher. Specifically, 4 tables
are created with each having 256 entries containing the values 0-255 that have been randomly shuffled. Each
table is used to map the values for a specific octet in an IP address. For example, when modifying the IP
address 10.10.10.10, the value at position 10 from each table will be substituted into the IP.
When the same IPset is passed to the --only-change-set and --dont-change-set switches, the output is
identical to the input.
December 18, 2014
225
rwrecgenerator(1)
The SiLK Reference Guide
rwrecgenerator
Generate random SiLK Flow records
SYNOPSIS
rwrecgenerator { --silk-output-path=PATH | --text-output-path=PATH
| { --output-directory=DIR_PATH
--processing-directory=DIR_PATH }}
--log-destination=DESTINATION [--log-level=LEVEL]
[--log-sysfacility=NUMBER] [--seed=SEED]
[--start-time=START_DATETIME --end-time=END_DATETIME]
[--time-step=MILLISECONDS] [--events-per-step=COUNT]
[--num-subprocesses=COUNT] [--flush-timeout=MILLISEC]
[--file-cache-size=SIZE] [--compression-method=COMP_METHOD]
[--epoch-time] [--integer-ips] [--zero-pad-ips]
[--integer-sensors] [--integer-tcp-flags] [--no-titles]
[--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [--delimited=[CHAR]]]
[--site-config-file=FILENAME] [--sensor-prefix-map=FILE]
[--flowtype-in=CLASS/TYPE] [--flowtype-inweb=CLASS/TYPE]
[--flowtype-out=CLASS/TYPE] [--flowtype-outweb=CLASS/TYPE]
rwrecgenerator --help
rwrecgenerator --version
DESCRIPTION
rwrecgenerator uses pseudo-random numbers to generate events, where each consists of one or more SiLK
Flow records. These flow records can written as a single binary file, as text (in either a columnar or a comma
separated value format) similar to the output from rwcut(1), or as a directory of small binary files to mimic
the incremental files produced by rwflowpack(8). The type of output to produce must be specified using
the appropriate switches. Currently only one type of output may be produced in a single invocation.
rwrecgenerator works through a time window, where the starting and ending times for the window may be
specified on the command line. When not specified, the window defaults to the previous hour. By default,
rwrecgenerator will generate one event at the start time and one event at the end time. To modify the
size of the steps rwrecgenerator takes across the window, specify the --time-step switch. The number of
events to create at each step may be specified with the --events-per-step switch.
The time window specifies when the events begin. Since most events create multiple flow records with small
time offsets between them (and some events may create flow records across multiple hours), flow records will
exist that begin after the time window.
To generate a single SiLK flow file, specify its location with the --silk-output-path switch. A value of will write the output to the standard output unless the standard output is connected to a terminal.
To produce textual output, specify --text-output-path. rwrecgenerator has numerous switches to control
the appearance of the text; however, currently rwrecgenerator produces a fixed set of fields.
226
December 18, 2014
The SiLK Reference Guide
rwrecgenerator(1)
When creating incremental files, the --output-directory and --processing-directory switches are required. rwrecgenerator creates files in the processing directory, and moves the files to the output directory
when the flush timeout arrives. The default flush timeout is 30,000 milliseconds (30 seconds); the user may
modify the value with the --flush-timeout switch. Any files in the processing directory are removed when
rwrecgenerator starts.
The --num-subprocesses switch tells rwrecgenerator to use multiple subprocesses when creating incremental files. When the switch is specified, rwrecgenerator will split the time window into multiple pieces
and give each subprocess its own time window to create. The initial rwrecgenerator process then waits
for the subprocesses to complete. When --num-subprocesses is specified, rwrecgenerator will create
subdirectories under the --processing-directory, where each subprocess gets its own processing directory.
The --seed switch may be specified to provide a consistent set of flow records across multiple invocations.
(Note that the names of the incremental files will differ across invocations since those names are created
with the mkstemp(3) function.)
Given the same seed for the pseudo-random number generator and assuming the --num-subprocesses is
not specified, the output from rwrecgenerator will contain the same data regardless of whether the output
is written to a single SiLK flow file, a text file, or a series of incremental files.
When both --seed and --num-subprocesses is specified, the incremental files will contain the same flow
records across invocations, but the flow records will not be consistent with those created by --silk-outputpath or --text-output-path.
rwrecgenerator must have access to a silk.conf(5) site configuration file, either specified by the --siteconfig-file switch on the command line or specified by the typical methods.
The --flowtype-in, --flowtype-inweb, --flowtype-out, and --flowtype-outweb switches may be used to
specify the flowtype (that is, the class/type pair) that rwrecgenerator uses for its flow records. When these
switches are not specified, rwrecgenerator attempts to use the flowtypes defined in the silk.conf file for the
twoway site. Specifically, it attempts to use ”all/in”, ”all/inweb”, ”all/out”, and ”all/outweb”, respectively.
Use of the --sensor-prefix-map switch is recommended. The argument should name a prefix map file that
maps from an internal IP address to a sensor number. If the switch is not provided, all flow records will use
the first sensor in the silk.conf file that is supported by the class specified by the flowtypes. When using the
--sensor-prefix-map, make certain the sensors you choose are in the class specified in the --flowtype-*
switches.
When using the --sensor-prefix-map switch and creating incremental files, it is recommended that you use
the --file-cache-size switch to increase the size of the stream cache to be approximately 12 to 16 times the
number of sensors. This will reduce the amount of time spent closing and reopening the files.
The --log-destination switch is required. Specify none to disable logging.
Currently, rwrecgenerator only supports generating IPv4 addresses. Addresses in 0.0.0.0/1 are considered
internal, and addresses in 128.0.0.0/1 are considered external. All flow records are between an internal and
an external address. Whether the internal addresses is the source or destination of the unidirectional flow
record is determined randomly.
The types of flow records that rwrecgenerator creates are:
• HTTP traffic on port 80/tcp that consists of a query and a response. This traffic will be about 30% of
the total by flow count.
• HTTPS traffic on port 443/tcp that consists of a query and a response. This traffic will be about 30%
of the total by flow count.
December 18, 2014
227
rwrecgenerator(1)
The SiLK Reference Guide
• DNS traffic on port 53/udp that consists of a query and a response. This traffic will be about 10% of
the total by flow count.
• FTP traffic on port 21/tcp that consists of a query and a response. This traffic will be about 4% of
the total by flow count.
• ICMP traffic on that consists of a single message. This traffic will be about 4% of the total by flow
count.
• IMAP traffic on port 143/tcp that consists of a query and a response. This traffic will be about 4% of
the total by flow count.
• POP3 traffic on port 110/tcp that consists of a query and a response. This traffic will be about 4% of
the total by flow count.
• SMTP traffic on port 25/tcp that consists of a query and a response. This traffic will be about 4% of
the total by flow count.
• TELNET traffic on port 23/tcp between two machines. This traffic may involve multiple flow records
that reach the active timeout of 1800 seconds. This traffic will be about 4% of the total by flow count.
• Traffic on IP Protocols 47, 50, or 58 that consists of a single record. This traffic will be about 4% of
the total by flow count.
• Scans of every port on one IP address. This traffic will be about 1% of the total by flow count.
• Scans of a single port across a range of IP addresses. This traffic will be about 1% of the total by flow
count.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
Output Switches
Exactly one of the following switches is required.
--silk-output-path=PATH
Specifies that rwrecgenerator should create a single binary file of SiLK flow records to the specified
location. If PATH is -, the records are written to the standard output. rwrecgenerator will not
support writing binary data to a terminal.
--output-directory=DIR PATH
Name the directory into which the incremental files are written once the flush timeout is reached.
--text-output-path=PATH
Specifies that rwrecgenerator should convert the flow records it creates to text and to print the result
in a format similar to that created by rwcut(1). The output will be written to the specified location.
If PATH is -, the records are written to the standard output.
228
December 18, 2014
The SiLK Reference Guide
rwrecgenerator(1)
Logging Switches
The --log-destination switch is required. Use a value of none to disable logging.
--log-destination=DESTINATION
Specify the destination where logging messages are written. When DESTINATION begins with a
slash /, it is treated as a file system path and all log messages are written to that file; there is no log
rotation. When DESTINATION does not begin with /, it must be one of the following strings:
none
Messages are not written anywhere.
stdout
Messages are written to the standard output.
stderr
Messages are written to the standard error.
syslog
Messages are written using the syslog(3) facility.
both
Messages are written to the syslog facility and to the standard error (this option is not available
on all platforms).
--log-level=LEVEL
Set the severity of messages that will be logged. The levels from most severe to least are: emerg,
alert, crit, err, warning, notice, info, debug. The default is info.
--log-sysfacility=NUMBER
Set the facility that syslog(3) uses for logging messages. This switch takes a number as an argument.
The default is a value that corresponds to LOG USER on the system where rwrecgenerator is running.
This switch produces an error unless --log-destination=syslog is specified.
General Switches
The following are general purpose switches. None are required.
--seed=SEED
Seed the pseudo-random number generator with the value SEED. When not specified, rwrecgenerator
creates its own seed. Specifying the seed allows different invocations of rwrecgenerator to produce
the same output (assuming the same value is given for all switches and that the time window is
specified).
--start-time=YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]]
--start-time=EPOCH SECONDS PLUS MILLISECONDS
Specify the earliest date and time at which an event is started. The specified time must be given to at
least day precision. Any parts of the date-time string that are not specified are set to 0. The switch
also accepts UNIX epoch seconds with optional fractional seconds. When not specified, defaults to the
beginning of the previous hour.
--end-time=YYYY/MM/DD[:HH[:MM[:SS[.ssssss]]]]
December 18, 2014
229
rwrecgenerator(1)
The SiLK Reference Guide
--end-time=EPOCH SECONDS PLUS MILLISECONDS
Specify the latest date and time at which an event is started. This time does not specify the latest
end-time for the flow records or even the latest start-time, since many events simulate a query/response
pair, with the response following the query by a few milliseconds. The specified time must be given
to at least day precision, and it must not be less than the start-time. Any parts of the date-time
string that are not specified are set to 0. The switch also accepts UNIX epoch seconds with optional
fractional seconds. When not specified, defaults to the end of the previous hour.
--time-step=MILLISECONDS
Move forward MILLISECONDS milliseconds at each step as rwrecgenerator moves through the time
window. When not specified, defaults to the difference between the start-time and end-time; that is,
rwrecgenerator will generate events at the start-time and then at the end-time. A MILLISECONDS
value of 0 indicates rwrecgenerator should only create events at the start-time.
--events-per-step=COUNT
Create COUNT events at each time step. The default is 1.
--help
Print the available options and exit.
--version
Print the version number and information about how rwrecgenerator was configured, then exit the
application.
Incremental Files Switches
The following switches are used when creating incremental files.
--processing-directory=DIR PATH
Name the directory under the incremental files are initially created. Any files in this directory are
removed when rwrecgenerator is started. When the flush timeout is reached, the files are closed and
moved from this directory to the output-directory. If --num-subprocesses is specified, subdirectories
are created under DIR PATH, and each subprocess is given its own subdirectory.
--num-subprocesses=COUNT
Tell rwrecgenerator to create COUNT subprocesses to generate incremental files. This switch is
ignored when incremental files are not being created. When this switch is specified, rwrecgenerator
creates subdirectories below the processing directory. The default value for COUNT is 0.
--flush-timeout=MILLISECONDS
Set the timeout for flushing any in-memory records to disk to MILLISECONDS milliseconds. At this
time, the incremental files are closed and the files are moved from the processing directory to the output
directory. The timeout uses the internal time as rwrecgenerator moves through the time window. If
not specified, the default is 30,000 milliseconds (30 seconds). This switch is ignored when incremental
files are not being created.
--file-cache-size=SIZE
Set the maximum number of data files to have open for writing at any one time to SIZE. If not specified,
the default is 32 files.
230
December 18, 2014
The SiLK Reference Guide
rwrecgenerator(1)
--compression-method=COMP METHOD
Set the compression method of the binary SiLK flow files to COMP METHOD. rwrecgenerator can
use an external library to compress its binary output. The list of available compression methods
and the default method are set when SiLK is compiled (the --help and --version switches print the
available and default compression methods) and depend on which supported libraries are found. SiLK
can support:
none
Do not compress the SiLK Flow records using an external library.
zlib
Use the zlib(3) library for compressing the flow records.
lzo1x
Use the lzo1x algorithm from the LZO real-time compression library for compressing the flow
records.
best
Use whichever available method gives the best compression in general, though not necessarily the
best for this particular file.
Text File Switches
The following switches can be used when creating textual output.
--timestamp-format=FORMAT
When producing textual output, specify how timestamps will be printed. When this switch is not
specified, timestamps are printed in the default format, and the timezone is UTC unless SiLK was
compiled with local timezone support. FORMAT is a comma-separated list of a format, a timezone,
and/or a modifier. The format is one of:
default
Print the timestamps as YYYY /MM /DDThh:mm:ss.sss.
iso
Print the timestamps as YYYY -MM -DD hh:mm:ss.sss.
m/d/y
Print the timestamps as MM /DD/YYYY hh:mm:ss.sss.
epoch
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK.
The timezone is one of:
utc
Use Coordinated Universal Time to print timestamps.
local
Use the TZ environment variable or the local timezone.
One modifier is available:
December 18, 2014
231
rwrecgenerator(1)
The SiLK Reference Guide
no-msec
Truncate the milliseconds value on the timestamps and on the duration field. When milliseconds
are truncated, the sum of the printed start time and duration may not equal the printed end time.
--epoch-time
When producing textual output, print timestamps as epoch time (number of seconds since midnight
GMT on 1970-01-01). This switch is equivalent to --timestamp-format=epoch, it is deprecated as
of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release.
--ip-format=FORMAT
When producing textual output, specify how IP addresses will be printed. When this switch is not
specified, IPs are printed in the canonical format. The FORMAT is one of:
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively. When the --ipv6-policy is force,
the output for 127.0.0.1 becomes 0000:0000:0000:0000:0000:ffff:7f00:0001.
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
When producing textual output, print IP addresses as integers. This switch is equivalent to --ipformat=decimal, it is deprecated as of SiLK 3.8.1, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
When producing textual output, print IP addresses as fully-expanded, zero-padded values in their
canonical form. This switch is equivalent to --ip-format=zero-padded, it is deprecated as of SiLK
3.8.1, and it will be removed in the SiLK 4.0 release.
--integer-sensors
When producing textual output, print the integer ID of the sensor rather than its name.
--integer-tcp-flags
When producing textual output, print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer
value. Typically, the characters F,S,R,P,A,U,E,C are used to represent the TCP flags.
--no-titles
When producing textual output, turn off column titles. By default, titles are printed.
232
December 18, 2014
The SiLK Reference Guide
rwrecgenerator(1)
--no-columns
When producing textual output, disable fixed-width columnar output.
--column-separator=C
When producing textual output, use specified character between columns and after the final column.
When this switch is not specified, the default of ’|’ is used.
--no-final-delimiter
When producing textual output, do not print the column separator after the final column. Normally
a delimiter is printed.
--delimited
--delimited=C
When producing textual output, run as if --no-columns --no-final-delimiter --column-sep=C had
been specified. That is, disable fixed-width columnar output; if character C is provided, it is used as
the delimiter between columns instead of the default ’|’.
SiLK Site Specific Switches
The following switches control the class/type and sensor that rwrecgenerator assigns to every flow record.
--sensor-prefix-map=FILE
Load a prefix map from FILE and use it to map from the internal IP addresses to sensor numbers. If
the switch is not provided, all flow records will use the first sensor in the silk.conf file that is supported
by the class named in the flowtype. The sensor IDs specified in FILE should agree with the class
specified in the --flowtype-* switches.
--flowtype-in=CLASS /TYPE
Set the class/type pair for flow records where the source IP is external, the destination IP is internal,
and the flow record is not considered to represent a web record to CLASS /TYPE. Web records are
those that appear on ports 80/tcp, 443/tcp, and 8080/tcp. When not specified, rwrecgenerator
attempts to find the flowtype ”all/in” in the silk.conf file.
--flowtype-inweb=CLASS /TYPE
Set the class/type pair for flow records representing web records where the source IP is external and
the destination IP is internal to CLASS /TYPE. When not specified and the --flowtype-in switch is
given, that CLASS /TYPE pair will be used. When neither this switch nor --flowtype-in is given,
rwrecgenerator attempts to find the flowtype ”all/inweb” in the silk.conf file.
--flowtype-out=CLASS /TYPE
Set the class/type pair for flow records where the source IP is internal, the destination IP is external,
and the flow record is not considered to represent a web record to CLASS /TYPE. When not specified,
rwrecgenerator attempts to find the flowtype ”all/out” in the silk.conf file.
--flowtype-outweb=CLASS /TYPE
Set the class/type pair for flow records representing web records where the source IP is internal and
the destination IP is external to CLASS /TYPE. When not specified and the --flowtype-out switch is
given, that CLASS /TYPE pair will be used. When neither this switch nor --flowtype-out is given,
rwrecgenerator attempts to find the flowtype ”all/outweb” in the silk.conf file.
December 18, 2014
233
rwrecgenerator(1)
The SiLK Reference Guide
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided, the location specified by the SILK CONFIG FILE environment variable is used if that variable
is not empty. The value of SILK CONFIG FILE should include the name of the file. Otherwise,
the application looks for a file named silk.conf in the following directories: the directory specified
in the SILK DATA ROOTDIR environment variable; the data root directory that is compiled into
SiLK (use the --version switch to view this value); the directories $SILK PATH/share/silk/ and
$SILK PATH/share/ ; and the share/silk/ and share/ directories parallel to the application’s directory.
ENVIRONMENT
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwrecgenerator may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwrecgenerator may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the
Timezone support value in the output from rwrecgenerator --version), the value of the TZ environment variable determines the timezone in which rwrecgenerator displays and parses timestamps.
If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty
string causes timestamps to be displayed in and parsed as UTC. The value of the TZ environment
variable is ignored when the SiLK installation uses utc. For system information on the TZ variable,
see tzset(3).
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
234
December 18, 2014
The SiLK Reference Guide
rwrecgenerator(1)
SEE ALSO
silk(7), rwcut(1), rwflowpack(8), silk.conf(5), syslog(3), zlib(3)
December 18, 2014
235
rwresolve(1)
The SiLK Reference Guide
rwresolve
Convert IP addresses in delimited text to hostnames
SYNOPSIS
rwresolve [--ip-fields=FIELDS] [--delimiter=C] [--column-width=N]
[--resolver={ c-ares | adns | getnameinfo | gethostbyaddr }]
[--max-requests=N]
rwresolve --help
rwresolve --version
DESCRIPTION
rwresolve is an application that reads delimited textual input and maps IP addresses in the input to host
names up performing a reverse DNS look-up. If the look-up succeeds, the IP is replaced with the host name
(rwresolve uses the first host name returned by the resolver). If the look-up fails, the IP address remains
unchanged.
rwresolve does a DNS query for every IP address, so it can be extremely slow. rwresolve works best on
very limited data sets. To reduce the number of DNS calls it makes, rwresolve caches the results of queries.
There are two libraries that support asynchronous DNS queries which rwresolve can use if either of those
libraries was found when SiLK was configured. These libraries are the ADNS library and the c-ares library.
Specify the --resolver switch to have rwresolve use a particular function for look-ups.
When an IP address resolves to multiple names, rwresolve prints the first name returned by the resolver.
rwresolve is designed specifically to deal with the output of rwcut(1), though it will work with other SiLK
tools that produce delimited text. rwresolve reads the standard input, splits the line into fields based on
the delimiter (default ’|’), converts the specified FIELDS (default fields 1 and 2) from an IP address in its
canonical form (e.g., dotted decimal for IPv4) to a hostname. If the field cannot be parsed as an address or
if the look up fails to return a hostname, the field is not modified. The fields to convert are specified via the
--ip-fields=FIELDS option. The --delimiter option can be used to specify an alternate delimiter.
Since hostnames are generally wider than IP addresses, the use of the --column-width field is advised
to increase the width of the IP columns. If this switch is not specified, no justification of hostnames is
attempted.
By default, rwresolve will use the c-ares library if available, then it will use the ADNS library if available.
To choose a different IP look up option, use the --resolver switch.
The maximum number of parallel DNS queries to attempt with c-ares or ADNS can be specified with the
--max-requests switch.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
236
December 18, 2014
The SiLK Reference Guide
rwresolve(1)
--ip-fields=FIELDS
Specify the column number(s) of the input that should be considered IP addresses. Column numbers
start from 1. If not specified, the default is 1,2.
--delimiter=C
Specify the character that separates the columns of the input. The default is ’|’.
--column-width=WIDTH
Set the width of the columns specified in --ip-fields to WIDTH. When specified, the FIELDS columns
always have the specified WIDTH regardless of whether the IP to hostname mapping was successful.
If this switch is not specified, fields containing IP addresses that could not be resolved will maintain
their input length, and fields where the lookup was successful will be printed with no padding.
--resolver=c-ares
Use the c-ares library to convert the IP addresses to hostnames. Requires that the c-ares library was
found when SiLK was configured. This library supports IPv6 look-ups when SiLK is compiled to
support IPv6.
--resolver=adns
Use the ADNS library to convert the IP addresses to hostnames. Requires that the ADNS library was
found when SiLK was configured. This library only supports IPv4 look-ups.
--resolver=getnameinfo
Use the getnameinfo(3) C library function to convert IP addresses to hostnames. This function
supports IPv6 look-ups when SiLK is compiled to support IPv6.
--resolver=gethostbyaddr
Use the gethostbyaddr(3) C library function to convert IP addresses to hostnames. This function
only supports IPv4.
--max-requests=MAX
When the c-ares or ADNS library is used, limit the number of outstanding DNS queries active at any
one time to MAX. The default is 128. This switch is not available if neither c-ares nor ADNS were
found when SiLK was compiled.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLE
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Suppose you have found some interesting data in the file interesting.rw, and you want to view the data using
rwcut(1), but you also want to determine the hostname of each the source IPs and append that hostname
to the rwcut output. In the example command below, note how the source IP field (rwcut field 1) was
specified twice in the rwcut invocation, and rwresolve is told to resolve the second occurrence, which is
field in column 13. This allows you to see the source IP (in the first column) and the host name it mapped
to (in the final column).
December 18, 2014
237
rwresolve(1)
$ rwcut --fields=1-12,1 interesting.rw
| rwresolve --ip-field=13
The SiLK Reference Guide
\
ENVIRONMENT
When ADNS is used, the following environment variables affect it. The ADNS form of each variable takes
precedence.
RES CONF
ADNS RES CONF
A filename, whose contents are in the format of resolv.conf.
RES CONF TEXT
ADNS RES CONF TEXT
A string in the format of resolv.conf.
RES OPTIONS
ADNS RES OPTIONS
These are parsed as if they appeared in the options line of a resolv.conf. In addition to being parsed
at this point in the sequence, they are also parsed at the very beginning before resolv.conf or any other
environment variables are read, so that any debug option can affect the processing of the configuration.
LOCALDOMAIN
ADNS LOCALDOMAIN
These are interpreted as if their contents appeared in a search line in resolv.conf.
SEE ALSO
rwcut(1), silk(7), gethostbyaddr(3), getnameinfo(3)
BUGS
Because rwresolve must do a DNS query for every IP address, it is extremely slow.
The output from rwresolve is rarely columnar because hostnames can be very long. You may want to
consider putting the resolved hostnames in the final column of output.
238
December 18, 2014
The SiLK Reference Guide
rwscan(1)
rwscan
Detect scanning activity in a SiLK dataset
SYNOPSIS
rwscan [--scan-model=MODEL] [--output-path=OUTFILE]
[--trw-internal-set=SETFILE]
[--trw-theta0=PROB] [--trw-theta1=PROB]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--integer-ips] [--model-fields] [--scandb]
[--threads=THREADS] [--queue-depth=DEPTH]
[--verbose-progress=CIDR] [--verbose-flows]
[ {--verbose-results | --verbose-results=NUM} ]
[--site-config-file=FILENAME]
[FILES...]
rwscan --help
rwscan --version
DESCRIPTION
rwscan reads sorted SiLK Flow records, performs scan detection analysis on those records, and outputs
textual columnar output for the scanning IP addresses. rwscan writes its out to the --output-path or to
the standard output when --output-path is not specified.
The types of scan detection analysis that rwscan supports are Threshold Random Walk (TRW) and Bayesian
Logistic Regression (BLR). Details about these techniques are described in the METHOD OF OPERATION
section below.
rwscan is designed to write its data into a database. This database can be queried using the rwscanquery(1) tool. See the EXAMPLES section for the recommended database schema.
The input to rwscan should be pre-sorted using rwsort(1) by the source IP, protocol, and destination IP
(i.e., --fields=sip,proto,dip).
rwscan reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified. To read the standard input in addition to the named files, use - or stdin
as a file name. If an input file name ends in .gz, the file will be uncompressed as it is read.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--scan-model=MODEL
Select a specific scan detection model. If not specified, the default value for MODEL is 0. See the
METHOD OF OPERATION section for more details.
December 18, 2014
239
rwscan(1)
The SiLK Reference Guide
0
Use the Threshold Random Walk (TRW) and Bayesian Logistic Regression (BLR) scan detection
models in series.
1
Use only the TRW scan detection model.
2
Use only the BLR scan detection model.
--output-path=OUTFILE
Specify the output file that scan records will be written to. If not specified, the scan records are written
to standard output.
--trw-internal-set=SETFILE
Specify an IPset file containing all valid internal IP addresses. This parameter is required when using
the TRW scan detection model, since the TRW model requires the list of targeted IPs (i.e., the IPs
to detect the scanning activity to). This switch is ignored when the TRW model is not used. For
information on creating IPset files, see the rwset(1) and rwsetbuild(1) manual pages. Prior to SiLK
3.4, this switch was named --trw-sip-set.
--trw-sip-set=SETFILE
This is a deprecated alias for --trw-internal-set.
--trw-theta0=PROB
Set the theta 0 parameter for the TRW scan model to PROB, which must be a floating point number
between 0 and 1. theta 0 is defined as the probability that a connection succeeds given the hypothesis
that the remote source is benign (not a scanner). The default value for this option is 0.8. This option
should only be used by experts familiar with the TRW algorithm.
--trw-theta1=PROB
Set the theta 1 parameter for the TRW scan model to PROB, which must be a floating point number
between 0 and 1. theta 1 is defined as the probability that a connection succeeds given the hypothesis
that the remote source is malicious (a scanner). The default value for this option is 0.2. This option
should only be used by experts familiar with the TRW algorithm.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns. When this switch is not specified, the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width column output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
240
December 18, 2014
The SiLK Reference Guide
rwscan(1)
--integer-ips
Print IP addresses as decimal integers instead of in their canonical representation.
--model-fields
Show scan model detail fields. This switch controls whether additional informational fields about the
scan detection models are printed.
--scandb
Produce output suitable for loading into a database. Sample database schema are given below under
EXAMPLES. This option is equivalent to --no-titles --no-columns --no-final-delimiter --modelfields --integer-ips.
--threads=THREADS
Specify the number of worker threads to create for scan detection processing. By default, one thread
will be used. Changing this number to match the number of available CPUs will often yield a large
performance improvement.
--queue-depth=DEPTH
Specify the depth of the work queue. The default is to make the work queue the same size as the
number of worker threads, but this can be changed. Normally, the default is fine.
--verbose-progress=CIDR
Report progress as rwscan processes input data. The CIDR argument should be an integer that
corresponds to the netblock size of each line of progress. For example, --verbose-progress=8 would
print a progress message for each /8 network processed.
--verbose-flows
Cause rwscan to print very verbose information for each flow. This switch is primarily useful for
debugging.
--verbose-results
--verbose-results=NUM
Print detailed information on each IP processed by rwscan. If a NUM argument is provided, only
print verbose results for sources that sent at least NUM flows. This information includes scan model
calculations, overall scan scores, etc. This option will generate a lot of output, and is primarily useful
for debugging.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwscan searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
December 18, 2014
241
rwscan(1)
The SiLK Reference Guide
METHOD OF OPERATION
rwscan’s default behavior is to consult two scan detection models to determine whether a source is a
scanner. The primary model used is the Threshold Random Walk (TRW) model. The TRW algorithm takes
advantage of the tendency of scanners to attempt to contact a large number of IPs that do not exist on the
target network.
By keeping track of the number of ”hits” (successful connections) and ”misses” (attempts to connect to
IP addresses that are not active on the target network), scanners can be detected quickly and with a high
degree of accuracy. Sequential hypothesis testing is used to analyze the probability that a source is a scanner
as each flow record is processed. Once the scan probability exceeds a configured maximum, the source is
flagged as a scanner, and no further analysis of traffic from that host is necessary.
The TRW model is not 100% accurate, however, and only finds scans in TCP flow data. In the case where
the TRW model is inconclusive, a secondary model called BLR is invoked. BLR stands for ”Bayesian Logistic
Regression.” Unlike TRW, the BLR approach must analyze all traffic from a given source IP to determine
whether that IP is a scanner.
Because of this, BLR operates much slower than TRW. However, the BLR model has been shown to detect
scans that are not detected by the TRW model, particularly scans in UDP and ICMP data, and vertical
TCP scans which focus on finding services on a single host. It does this by calculating metrics from the flow
data from each source, and using those metrics to arrive at an overall likelihood that the flow data represents
scanning activity.
The metrics BLR uses for detecting scans in TCP flow data are:
• the ratio of flows with no ACK bit set to all flows
• the ratio of flows with fewer than three packets to all flows
• the average number of source ports per destination IP address
• the ratio of the number of flows that have an average of 60 bytes/packet or greater to all flows
• the ratio of the number of unique destination IP addresses to the total number of flows
• the ratio of the number of flows where the flag combination indicates backscatter to all flows
The metrics BLR uses for detecting scans in UDP flow data are:
• the ratio of flows with fewer than three packets to all flows
• the maximum run length of IP addresses per /24 subnet
• the maximum number of unique low-numbered (less than 1024) destination ports contacted on any one
host
• the maximum number of consecutive low-numbered destination ports contacted on any one host
• the average number of unique source ports per destination IP address
• the ratio of flows with 60 or more bytes/packet to all flows
• the ratio of unique source ports (both low and high) to the number of flows
The metrics BLR uses for detecting scans in ICMP flow data are:
242
December 18, 2014
The SiLK Reference Guide
rwscan(1)
• the maximum number of consecutive /24 subnets that were contacted
• the maximum run length of IP addresses per /24 subnet
• the maximum number of IP addresses contacted in any one /24 subnet
• the total number of IP addresses contacted
• the ratio of ICMP echo requests to all ICMP flows
Because the TRW model has a lower false positive rate than the BLR model, any source identified as a
scanner by TRW will be identified as a scanner by the hybrid model without consulting BLR. BLR is only
invoked in the following cases:
• The traffic being analyzed is UDP or ICMP traffic, which rwscan’s implementation of TRW cannot
process.
• The TRW model has identified the source as benign. This occurs when the scan probability drops
below a configured minimum during sequential hypothesis testing.
• The TRW model has identified the source as unknown (where the scan probability never exceeded the
minimum or maximum thresholds during sequential hypothesis testing).
In situations where the use of one model is preferred, the other model can be disabled using the --scan-model
switch. This may have an impact on the performance and/or accuracy of the system.
LIMITATIONS
rwscan detects scans in IPv4 flows only.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Basic Usage
Assuming a properly sorted SiLK Flow file as input, the basic usage for Bayesian Logistic Regression (BLR)
scan detection requires only the input file, data.rw, and output file, scans.txt, arguments.
$ rwscan --scan-model=2 --output-path=scans.txt data.rw
Basic usage of Threshold Random Walk (TRW) scan detection requires the IP addresses of the targeted
network (i.e., the internal IP space), specified in the internal.set IPset file.
$ rwscan --trw-internal-set=internal.set --output-path=scans.txt data.rw
December 18, 2014
243
rwscan(1)
The SiLK Reference Guide
Typical Usage
More commonly, an analyst uses rwfilter(1) to query the data repository for flow records within a time
window. First, the analyst has rwset(1) put the source addresses of outgoing flow records into an IPset,
resulting in the IPset containing the IPs of active hosts on the internal network. Next, the incoming traffic
is piped to rwsort(1) and then to rwscan.
$ rwfilter --start=2004/12/29:00 --type=out,outweb --all-dest=stdout \
| rwset --sip=internal.set
$ rwfilter --start=2004/12/29:00 --type=in,inweb --all-dest=stdout \
| rwsort --fields=sip,proto,dip
\
| rwscan --trw-internal-set=internal.set --scan-model=0
\
--output-path=scans.txt
Storing Scans in a PostgreSQL Database
Instead of having the analyst run rwscan directly, often the output from rwscan is put into a database
where it can be queried by rwscanquery(1). The output produced by the --scandb switch is suitable for
loading into a database of scans. The process for using the PostgreSQL database is described in this section.
Schemas for Oracle, MySQL, and SQLite are provided below, but the details to create users with the proper
rolls are not included.
Here is the schema for PostgreSQL:
CREATE DATABASE scans
CREATE SCHEMA scans
CREATE SEQUENCE scans_id_seq
CREATE TABLE scans (
id
BIGINT
NOT NULL
DEFAULT nextval(’scans_id_seq’),
sip
BIGINT
NOT NULL,
proto
SMALLINT
NOT NULL,
stime
TIMESTAMP without time zone NOT NULL,
etime
TIMESTAMP without time zone NOT NULL,
flows
BIGINT
NOT NULL,
packets
BIGINT
NOT NULL,
bytes
BIGINT
NOT NULL,
scan_model INTEGER
NOT NULL,
scan_prob
FLOAT
NOT NULL,
PRIMARY KEY (id)
)
CREATE INDEX scans_stime_idx ON scans (stime)
CREATE INDEX scans_etime_idx ON scans (etime)
;
A database user should be created for the purposes of populating the scan database, e.g.:
244
December 18, 2014
The SiLK Reference Guide
rwscan(1)
CREATE USER rwscan WITH PASSWORD ’secret’;
GRANT ALL PRIVILEGES ON DATABASE scans TO rwscan;
Additionally, a user with read-only access should be created for use by the rwscanquery tool:
CREATE USER rwscanquery WITH PASSWORD ’secret’;
GRANT SELECT ON DATABASE scans TO rwscanquery;
To import rwscan’s --scandb output into a PostgreSQL database, use a command similar to the following:
$ cat /tmp/scans.import.txt
| psql -c
"COPY scans
(sip, proto, stime, etime,
flows, packets, bytes,
scan_model, scan_prob)
FROM stdin DELIMITER as ’|’" scans
\
\
\
\
\
\
Sample Schema for Oracle
CREATE TABLE scans (
id
integer unsigned
sip
integer unsigned
proto
tinyint unsigned
stime
datetime
etime
datetime
flows
integer unsigned
packets
integer unsigned
bytes
integer unsigned
scan_model integer unsigned
scan_prob
float unsigned
primary key (id)
);
not
not
not
not
not
not
not
not
not
not
null unique,
null,
null,
null,
null,
null,
null,
null,
null,
null,
not
not
not
not
not
not
not
not
not
not
null auto_increment,
null,
null,
null,
null,
null,
null,
null,
null,
null,
Sample Schema for MySQL
CREATE TABLE scans (
id
integer unsigned
sip
integer unsigned
proto
tinyint unsigned
stime
datetime
etime
datetime
flows
integer unsigned
packets
integer unsigned
bytes
integer unsigned
scan_model integer unsigned
scan_prob
float unsigned
December 18, 2014
245
rwscan(1)
The SiLK Reference Guide
primary key (id),
INDEX (stime),
INDEX (etime)
) TYPE=InnoDB;
Sample Schema and Import Command for SQLite
CREATE TABLE scans (
id
INTEGER PRIMARY KEY AUTOINCREMENT,
sip
INTEGER
NOT NULL,
proto
SMALLINT
NOT NULL,
stime
TIMESTAMP
NOT NULL,
etime
TIMESTAMP
NOT NULL,
flows
INTEGER
NOT NULL,
packets
INTEGER
NOT NULL,
bytes
INTEGER
NOT NULL,
scan_model INTEGER
NOT NULL,
scan_prob
FLOAT
NOT NULL
);
CREATE INDEX scans_stime_idx ON scans (stime);
CREATE INDEX scans_etime_idx ON scans (etime);
To import rwscan’s --scandb output into a SQLite database, use the following command:
$ perl -nwe ’chomp;
print "INSERT INTO scans VALUES (NULL,",
(join ",",map { / / ? qq("$_") : $_ } split /\|/),
");\n";’ \
scans.txt | sqlite3 scans.sqlite
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwscan may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwscan may use this environment variable. See the FILES section for details.
246
December 18, 2014
The SiLK Reference Guide
rwscan(1)
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwscanquery(1), rwfilter(1), rwsort(1), rwset(1), rwsetbuild(1), silk(7)
BUGS
When used in an IPv6 environment, rwscan converts IPv6 flow records that contain addresses in the
::ffff:0:0/96 prefix to IPv4. IPv6 records outside of that prefix are silently ignored.
December 18, 2014
247
rwscanquery(1)
The SiLK Reference Guide
rwscanquery
Query the network scan database
SYNOPSIS
rwscanquery [options]
Report Options:
--start-date=YYYY/MM/DD:HH Report on scans active after this date.
--end-date=YYYY/MM/DD:HH
Defaults to start-date.
--report=REPORT_TYPE
Select query and output options. Values
for REPORT_TYPE are standard, volume,
scanset, scanflows, respflows, and export
--saddress=ADDR_SPEC
--sipset=IPSET_FILE
Show scans originating from matching hosts.
Show scans originating from hosts in set.
--daddress=IP_WILDCARD
--dipset=IPSET_FILE
Show only scans targeting matching hosts.
Show only scans targeting hosts in set.
--show-header
--columnar
--output-path=PATH
Display column titles at start of output.
Display more human-readable columnar view.
Write results to the specified file.
Configuration Options:
--database=DBNAME
Query an alternate scan database
Help Options:
--help
--man
--version
Display this brief help message.
Display the full documentation.
Display the version information.
DESCRIPTION
rwscanquery queries the network scan database---that is, the database that contains scans found by rwscan(1). The type of output rwscanquery creates is controlled by the --report switch as described in the
Report Options section below. rwscanquery writes its output to the location specified by the --outputpath switch or to the standard output when that switch is not provided.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
248
December 18, 2014
The SiLK Reference Guide
rwscanquery(1)
Report Options
--start-date=YYYY/MM/DD:HH
Display scans which were active after this hour. When this argument contains a date with no hour and
no --end-date option is specified, scans for that entire day are returned. If this option is not specified
at all, scans for the current day (based on the local time on the host machine) are returned.
--end-date=YYYY/MM/DD:HH
Display scans which were active before the end of this hour. If no end-date is given, defaults to the
same as start-date. It is an error to provide an end-date without a start-date.
--report=TYPE
Specify the query and the type of output to create. When this switch is not specified, the default is a
standard report. The supported values for TYPE are:
standard
Write one textual line of output for each scan record in the scan database. By default, the output
has no titles and it is not in columnar form. Specify the --show-header and/or --columnar
switches to make the output more human readable.
volume
Write a daily scan activity volume summary report for each day within the time period. By
default, the output has no titles and it is not in columnar form. Specify the --show-header
and/or --columnar switches to make the output more human readable.
scanset
Write an IPset file containing the IP addresses which were the sources of scan activity during the
selected time period. The output of this report type is binary, so you must redirect or pipe the
output to a location or specify the --output-path switch.
scanflows
Write a SiLK Flow file containing all flows originating from scanning IP addresses within the
specified time period. This flow data will include flows originating from any host that would be
listed as a scan source by your query, from any time within the time period specified by --start-date
and --end-date. Note that this may include flows that were not identified by the scan analysis as
being part of a scan. The output of this report type is binary, so you must redirect or pipe the
output to a location or specify the --output-path switch.
respflows
Write a SiLK Flow file containing all flows sent to scanning IP addresses within the specified time
period---that is, possible responses to the scanners. The output of this report type is binary, so
you must redirect or pipe the output to a location or specify the --output-path switch.
export
Write output consistent with the output format of the rwscan(1) tool.
--saddress=ADDR SPEC
Display scans originating from hosts described in ADDR SPEC, where ADDR SPEC is a list of addresses, address ranges, and CIDR blocks. Only scans originating from hosts in the list will be displayed.
--sipset=IPSET FILE
Display scans originating from hosts in IPSET FILE, where IPSET FILE is a standard SiLK IPset
file as created by rwset(1) or rwsetbuild(1). Note that a very complex IPset may take a long time
to process, or even fail to return any results.
December 18, 2014
249
rwscanquery(1)
The SiLK Reference Guide
--daddress=IP WILDCARD
Display scans targeting hosts described in IP WILDCARD, where IP WILDCARD is a single IP
address, a single CIDR block, or an IP Wildcard expression accepted by rwfilter(1). To match on
multiple IPs or networks, use the --dipset option. This option is ignored for --report types other
than scanset, scanflows, and respflows.
--dipset=IPSET FILE
Display scans targeting hosts in IPSET FILE, where IPSET FILE is a standard SiLK IPset file. Note
that a very complex set may take a long time to process, or even fail to return any results. This option
is ignored for --report types other than scanset, scanflows, and respflows.
--show-header
Display a header line giving a short name (or title) for each field when printing textual output with
the standard, volume, or export report types. By default, no header is displayed.
--columnar
Display output in more human-readable columnar format when printing textual output with the
standard or volume report types. By default, the output is presented as data fields delimited by
the | character.
--output-path=PATH
Write results to PATH instead of to the standard output.
Configuration Options
--database=DBNAME
Select a database instance other than the default. The default is specified by the .rwscanrc configuration
file as described below.
Other Options
--help
Display a brief usage message and exit.
--man
Display full documentation for rwscanquery and exit.
--version
Print the version number and exit the application.
CONFIGURATION
rwscanquery reads configuration information from a file named .rwscanrc. If the RWSCANRC environment
variable is set, it is used as the location of the .rwscanrc file. When RWSCANRC is not set, rwscanquery
attempts to find a file name .rwscanrc in the directories specified in the FILES section below.
The format of the .rwscanrc file is name=value pairs, one per line. The configuration parameters currently
read from .rwscanrc are:
250
December 18, 2014
The SiLK Reference Guide
rwscanquery(1)
db driver
The type of database to connect to. rwscanquery supports ”oracle”, ”postgresql”, ”mysql”, and
”sqlite”.
db userid
The userid to use when connecting to the scan database.
db password
The password to use when connecting to the scan database.
db instance
The name of the database instance to connect to if none is provided with the --database command
line switch. If neither this configuration option nor the --database command line option are specified,
the hard-coded default database instance ”SCAN” is used.
rw in class
The class for incoming flow data. The rw in class and rw in type values are used to query scan flows
when the scanflows report type is requested or when the --daddress or --dipset switches are used
for the scanset report type. If not specified, rwfilter’s default is used.
rw in type
The type(s) for incoming flow data. See rw in class for details.
rw out class
The class for outgoing flow data. The rw out class and rw out type values are used to query scan
flows when the respflows report type is requested. If not specified, rwfilter’s default is be used.
rw out type
The type(s) for outgoing flow data. See rw out class for details. (Note that rwfilter often defaults
to querying incoming flows, so this parameter ought to be specified.)
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
•
$ rwscanquery --start-date=2005/04/19:21
This command would display information on all scans occurring in the hour from 21:00 up to but not
including 22:00 on April 19, 2005.
•
$ rwscanquery --start-date=2005/04/19:21 --end-date=2005/04/19:22
This command would display information on all scans occurring after or including 21:00 on 2005/04/19,
up through but not including 23:00 on 2005/04/19.
•
$ rwscanquery --start-date=2005/04/19:21
\
--saddress=192.168/16,127.0.0.1,255.255.255.0-255.255.255.255
This command would display queries originating from addresses in the slash-16 block 192.168, or from
address 127.0.0.1, or from any address between 255.255.255.0 and 255.255.255.255, inclusive.
•
$ rwscanquery --start-date=2005/04/19:21 --sipset=MyIPSet.set
This command would display information on all scans in the given hour which had a source address in
the IP set file MyIPSet.set.
December 18, 2014
251
rwscanquery(1)
The SiLK Reference Guide
ENVIRONMENT
RWSCANRC
This environment variable allows the user to specify the location of the .rwscanrc configuration file.
The value may be a complete path or a file relative to the user’s current directory. See the FILES
section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction for the report types of scanset, scanflows, and respflows.
SILK CONFIG FILE
This environment variable is used as the location for the site configuration file, silk.conf, for report
types that use rwfilter. When this environment variable is not set, rwfilter searches for the site
configuration file in the locations specified in the FILES section.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository for report types that use
rwfilter. This value overrides the compiled-in value. In addition, rwfilter may use this value when
searching for the SiLK site configuration files. See the FILES section for details.
SILK RWFILTER THREADS
The number of threads rwfilter uses when reading files from the data store.
SILK PATH
This environment variable gives the root of the install tree. When searching for the site configuration
file, rwfilter may use this environment variable. See the FILES section for details.
RWFILTER
Complete path to rwfilter. If not set, rwscanquery attempts to find rwfilter on your PATH.
RWSET
Complete path to rwset. If not set, rwscanquery attempts to find rwset on your PATH.
RWSETBUILD
Complete path to rwsetbuild. If not set, rwscanquery attempts to find rwsetbuild on your PATH.
RWSETCAT
Complete path to rwsetcat(1). If not set, rwscanquery attempts to find rwsetcat on your PATH.
FILES
${RWSCANRC}
${HOME}/.rwscanrc
/usr/local/share/silk/.rwscanrc
Possible locations for the rwscanquery configuration file, .rwscanrc. In addition, rwscanquery
checks the parent directory of the directory containing the rwscanquery script.
${SILK CONFIG FILE}
252
December 18, 2014
The SiLK Reference Guide
rwscanquery(1)
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file---for report types that use rwfilter.
SEE ALSO
rwscan(1), rwfilter(1), rwset(1), rwsetbuild(1), rwsetcat(1), silk.conf(5), silk(7)
December 18, 2014
253
rwset(1)
The SiLK Reference Guide
rwset
Generate binary IPset files of unique IP addresses
SYNOPSIS
rwset {--sip-file=FILE | --dip-file=FILE
| --nhip-file=FILE | --any-file=FILE [...]}
[--record-version=VERSION] [--invocation-strip]
[--note-add=TEXT] [--note-file-add=FILE]
[--print-filenames] [--copy-input=PATH]
[--compression-method=COMP_METHOD]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwset --help
rwset --version
DESCRIPTION
rwset reads SiLK Flow records and generates one to four binary IPset file(s). In a single pass, rwset can
create one of each type of its possible outputs, which are IPset files containing:
• the unique source IP addresses
• the unique destination IP addresses
• the unique next-hop IP addresses
• the unique source and destination IP addresses
The output files must not exist prior to invoking rwset. To write an IPset file to the standard output,
specify stdout or - as the output file name. rwset will complain if you attempt to write the IPset to the
standard output and standard output is connected to the terminal. Only one IPset file may be written to
the standard output.
rwset reads SiLK Flow records from the files named on the command line or from the standard input when
no file names are specified and --xargs is not present. To read the standard input in addition to the named
files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as it
is read. When the --xargs switch is provided, rwset will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
IPset files are in a binary format that efficiently stores a set of IP addresses. The file only stores the presence
of an IP address; no volume information (such as a count of the number of times the IP address occurs) is
maintained. To store volume information, use rwbag(1).
Use rwsetcat(1) to see the IP addresses in a binary IPset file. To create a binary IPset file from a list of
IP addresses, use rwsetbuild(1). rwsettool(1) allows you to perform set operations on binary IPset files.
To determine if an IP address is a member of a binary IPset, use rwsetmember(1).
254
December 18, 2014
The SiLK Reference Guide
rwset(1)
To list the IPs that appear in the SiLK Flow file flows.rw, the command
$ rwset --sip-file=stdout flows.rw | rwsetcat
will be faster than rwuniq(1), but rwset cannot report total volume or do the thresholding that rwuniq
supports.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
At least one of the following output switches is required; multiple output switches can be given, but an
output switch cannot be repeated.
--sip-file=FILE
Store the unique source IP addresses in the binary IPset file FILE. rwset will write the IPset file to
the standard output when FILE is stdout or - and the standard output is not a terminal.
--dip-file=FILE
Store the unique destination IP addresses in the binary IPset file FILE. rwset will write the IPset file
to the standard output when FILE is stdout or - and the standard output is not a terminal.
--nhip-file=FILE
Store the unique next-hop IP addresses in the binary IPset file FILE. rwset will write the IPset file
to the standard output when FILE is stdout and the standard output is not a terminal.
--any-file=FILE
Store the unique source and destination IP addresses in the binary IPset file FILE. rwset will write
the IPset file to the standard output when FILE is stdout or - and the standard output is not a
terminal.
Only one of the above switches my use stdout as the name of the file.
rwset supports these additional switches:
--record-version=VERSION
Specify the format of the IPset records that are written to the output. Valid values are 0, 2, 3, and
4. When the switch is not provided, the SILK IPSET RECORD VERSION environment variable is
checked for a version. A VERSION of 2 creates a file compatible with SiLK 2.x, and it can only be
used for IPsets containing IPv4 addresses. A VERSION of 3 creates a file that can only be read by
SiLK 3.0 or later. A VERSION of 4 creates a file that can only be read by SiLK 3.7 or later. Version 4
files are smaller than version 3 files. The default VERSION is 0, which uses version 2 for IPv4 IPsets
and version 3 for IPv6 IPsets.
--invocation-strip
Do not record any command line history; that is, do not copy the invocation history from the input
files to the output file, and do not record the current command line invocation in the output.
December 18, 2014
255
rwset(1)
The SiLK Reference Guide
--note-add=TEXT
Add the specified TEXT to the header of every output file as an annotation. This switch may be
repeated to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of every output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--print-filenames
Prints to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to the
standard output as long as no IPset files are being written there.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains. Only IP
addresses contained in IPv4 flow records will be added to the IPset(s).
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
mix
Process the input as a mixture of IPv4 and IPv6 flow records. When the input contains IPv6
addresses outside of the ::ffff:0:0/96 prefix, this policy is equivalent to force; otherwise it is
equivalent to asv4.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6. Only IP addresses contained in IPv6 flow
records will be added to the IPset(s).
Regardless of the IPv6 policy, when all IPv6 addresses in the IPset are in the ::ffff:0:0/96 prefix, rwset
treats them as IPv4 addresses and writes an IPv4 IPset. When any other IPv6 addresses are present
in the IPset, the IPv4 addresses in the IPset are mapped into the ::ffff:0:0/96 prefix and rwset writes
an IPv6 IPset.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
256
December 18, 2014
The SiLK Reference Guide
rwset(1)
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwset searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwset to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwset will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
rwset is intended to work tightly with rwfilter(1). For example, consider generating two IPsets: the first
file, low packet tcp.set, contains the source IP addresses for incoming flow records (that is, the external hosts)
where the record has no more than three packets in its sessions. The second IPset file, high packet tcp.set,
contains the external IPs for records with four or more packets.
The first set, for TCP traffic on 03/01/2003 can be generated with:
$ rwfilter --start-date=2003/03/01:00 --end-date=2003/03/01:23
--proto=6 --packets=1-3 --pass=stdout
| rwset --sip-file=low_packet_tcp.set
\
\
The second set with:
$ rwfilter --start-date=2003/03/01:00 --end-date=2003/03/01:23
--proto=6 --packets=4- --pass=stdout
| rwset --sip-file=high_packet_tcp.set
December 18, 2014
\
\
257
rwset(1)
The SiLK Reference Guide
ENVIRONMENT
SILK IPSET RECORD VERSION
This environment variable is used as the value for the --record-version when that switch is not
provided.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwset may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwset may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwsetbuild(1), rwsetcat(1), rwsettool(1), rwsetmember(1), rwfilter(1), rwfileinfo(1), rwbag(1),
rwuniq(1), silk(7), zlib(3)
NOTES
The --record-version switch was added in SiLK 3.0. Prior to SiLK 3.6, the only supported arguments for
the switch were 2 and 3, with the default being 3. As of SiLK 3.6, the default is 0. Version 4 was added in
SiLK 3.7.
258
December 18, 2014
The SiLK Reference Guide
rwsetbuild(1)
rwsetbuild
Create a binary IPset file from list of IPs
SYNOPSIS
rwsetbuild [{--ip-ranges | --ip-ranges=DELIM}]
[--record-version=VERSION] [--invocation-strip]
[--note-add=TEXT] [--note-file-add=FILENAME]
[--compression-method=COMP_METHOD]
[{INPUT_TEXT_FILE | -} [{OUTPUT_SET_FILE | -}]]
rwsetbuild --help
rwsetbuild --version
DESCRIPTION
rwsetbuild creates a binary IPset file from textual input. The IPset will be written to the second command
line argument if it has been specified; otherwise the IPset is written to the standard output if the standard
output is not a terminal. rwsetbuild will not overwrite an existing file. The textual input is read from the
first command line argument if it has been specified; otherwise the text is read from the standard input if
the standard input is not a terminal. A input file name of stdin or - means the standard input; an output
file name of stdout or - means the standard output. rwsetbuild will read textual IPs from the terminal if
the standard input is explicitly given as the input. rwsetbuild exits with an error if the input file cannot
be read or the output file cannot be written.
Comments are ignored in the input file; they begin with the ’#’ symbol and continue to the end of the
line. Whitespace and blank lines are also ignored. Otherwise, a line should contain a single IP addresses
unless the --ip-ranges switch is specified, in which case a line may contain two IP addresses separated by
the user-specified delimiter, which defaults to hyphen (-).
rwsetbuild supports IPv4 addresses. When SiLK has been built with IPv6 support, rwsetbuild can build
an IPset containing IPv6 addresses. When the input contains a mixture of IPv4 and IPv6 addresses, the
IPv4 addresses are mapped into the ::ffff:0:0/96 block of IPv6. When writing the IPset, rwsetbuild will
convert the output to IPv4 if all IPv6 addresses were in the ::ffff:0:0/96 block. rwsetbuild does not allow
the input to contain both integer values and IPv6 addresses.
Each IP address must be expressed in one of these formats:
• Canonical IPv4 address (i.e., dotted decimal---all 4 octets are required):
10.1.2.4
• An unsigned 32-bit integer:
167838212
• Canonical IPv6 address:
December 18, 2014
259
rwsetbuild(1)
The SiLK Reference Guide
2001:db8::f00
• Any of the above with a CIDR designation:
10.1.2.4/31
167838212/31
192.168.0.0/16
2001:db8::/48
• SiLK IP Wildcard: An IP Wildcard can represent multiple IPv4 or IPv6 addresses. An IP Wildcard
contains an IP in its canonical form, except each part of the IP (where part is an octet for IPv4 or a
hexadectet for IPv6) may be a single value, a range, a comma separated list of values and ranges, or
the letter x to signify all values for that part of the IP (that is, 0-255 for IPv4). You may not specify
a CIDR suffix when using IP Wildcard notation. IP Wildcard notation is not supported when the
--ip-ranges switch is specified.
10.x.1-2.4,5
2001:db8::aaab-ffff,aaaa,0-aaa9
• IP Range: An IPv4 address, an unsigned 32-bit integer, or an IPv6 address to use as the start of the
range, a delimiter, and an IPv4 address, an unsigned 32-bit integer, or an IPv6 address to use as the
end of the range. The default delimiter is the hyphen (-), but a different delimiter may be specified
as a parameter to the --ip-ranges switch. Whitespace around the IP addresses is ignored. Only valid
when --ip-ranges is specified.
10.1.2.4-10.1.2.5
167838212-167838213
192.168.0.0-192.168.255.255
2001:db8::f00-2001:db8::fff
If an IP address cannot be parsed, rwsetbuild will exit with an error.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--ip-ranges
--ip-ranges=DELIM
Allow the input file to contain ranges of IP addresses. If DELIM is not specified, hyphen (-) is used
as the delimiter. DELIM may be the space character. This method also supports lines that contain a
single IP address (or integer); these lines may have a CIDR designation. CIDR designations are not
supported on lines that contain DELIM. When --ip-ranges is active, SiLK wildcard IP syntax is not
supported.
--record-version=VERSION
Specify the format of the IPset records that are written to the output. Valid values are 0, 2, 3, and
4. When the switch is not provided, the SILK IPSET RECORD VERSION environment variable is
260
December 18, 2014
The SiLK Reference Guide
rwsetbuild(1)
checked for a version. A VERSION of 2 creates a file compatible with SiLK 2.x, and it can only be
used for IPsets containing IPv4 addresses. A VERSION of 3 creates a file that can only be read by
SiLK 3.0 or later. A VERSION of 4 creates a file that can only be read by SiLK 3.7 or later. Version 4
files are smaller than version 3 files. The default VERSION is 0, which uses version 2 for IPv4 IPsets
and version 3 for IPv6 IPsets.
--invocation-strip
Do not record any command line history; that is, do not record the current command line invocation
in the output file.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
December 18, 2014
261
rwsetbuild(1)
The SiLK Reference Guide
EXAMPLE
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Reading from a file:
$ echo 10.x.x.x > ten.txt
$ rwsetbuild ten.txt ten.set
$ echo 10.0.0.0/8 > ten.txt
$ rwsetbuild ten.txt ten.set
$ echo 10.0.0.0-10.255.255.255 > ten.txt
$ rwsetbuild --ip-ranges ten.txt ten.set
$ echo ’167772160,184549375’ > ten.txt
$ rwsetbuild --ip-ranges=, ten.txt ten.set
Reading from the standard input:
$ echo 192.168.x.x | rwsetbuild stdin private.set
Example input to rwsetbuild:
# A single address
10.1.2.4
# Two addresses in the same subnet
10.1.2.4,5
# The same two addresses
10.1.2.4/31
# The same two addresses
167838212/31
# A whole subnet
10.1.2.0-255
# The same whole subnet
10.1.2.x
# The same whole subnet yet again
10.1.2.0/24
# All RFC1918 space
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
# All RFC1918 space
10.x.x.x
172.16-20,21,22-31.x.x
192.168.x.x
# All RFC1918 space
167772160/8
2886729728/12
262
December 18, 2014
The SiLK Reference Guide
rwsetbuild(1)
3232235520/16
# Everything ending in 255
x.x.x.255
# All addresses that end in 1-10
x.x.x.1-10
ENVIRONMENT
SILK IPSET RECORD VERSION
This environment variable is used as the value for the --record-version when that switch is not
provided.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SEE ALSO
rwset(1), rwsetcat(1), rwsetmember(1), rwsettool(1), rwfileinfo(1), silk(7), zlib(3)
NOTES
The --record-version switch was added in SiLK 3.0. Prior to SiLK 3.6, the only supported arguments for
the switch were 2 and 3, with the default being 3. As of SiLK 3.6, the default is 0. Version 4 was added in
SiLK 3.7.
December 18, 2014
263
rwsetcat(1)
The SiLK Reference Guide
rwsetcat
Print the IP addresses in a binary IPset file
SYNOPSIS
rwsetcat [--count-ips] [--print-statistics] [--print-ips]
[--cidr-blocks | --cidr-blocks=0 | --cidr-blocks=1]
[--network-structure | --network-structure=STRUCTURE]
[--ip-ranges]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--no-columns] [--column-separator=C] [--no-final-delimiter]
[{--delimited | --delimited=C}]
[--print-filenames | --print-filenames=0 | --print-filenames=1]
[--pager=PAGER_PROG] [SET_FILE...]
rwsetcat --help
rwsetcat --version
DESCRIPTION
When run with no switches, rwsetcat reads each IPset file given on the command line and prints its
constituent IP addresses to the standard output. When the input IPset contains IPv4 data, rwsetcat prints
one IP address per line; when the IPset contains IPv6 data, rwsetcat prints the IPs as CIDR blocks. If no
file names are listed on the command line, rwsetcat will attempt to read an IPset from the standard input.
rwsetcat can produce additional information about IPset files, such as the number of IPs they contain, the
number of IPs at the /8, /16, /24, and /27 levels, and the minimum and maximum IPs.
To create an IPset file from SiLK Flow records, use rwset(1). rwsetbuild(1) creates an IPset from textual
input. The --coverset switch on rwbagtool(1) creates an IPset from a binary SiLK Bag.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--count-ips
Print a count of the number of IP addresses in the IPset file. This switch disables the printing of the
IP addresses in the IPset file. See --print-ips for more information. When --count-ips is specified
and more than one IPset file is provided, rwsetcat prepends the name of the input file and a colon to
the IP address count. See the description of the --print-filenames switch for more information.
--print-statistics
Print statistics about the IPset. The statistics include the minimum IP address, the maximum IP
address, and, for each CIDR block of /8, /16, /24, /27, and /32, the number of blocks occupied and
what percentage of coverage that represents. This switch disables the printing of the IP addresses in
264
December 18, 2014
The SiLK Reference Guide
rwsetcat(1)
the IPset. See --print-ips for more information. When --print-statistics is specified and more than
one IPset file is provided, rwsetcat prints the name of the input file, a colon, and a newline prior to
printing the statistics. See the description of the --print-filenames switch for more information.
--print-ips
Force printing of the IP addresses, even when the --count-ips or --print-statistics option is provided.
--cidr-blocks
--cidr-blocks=0
--cidr-blocks=1
When an argument is not provided to the switch or when the argument is 1, print the IPs in the
IPset file, grouping sequential IPs into the largest possible CIDR block. If the argument is 0, print
the individual IPs in the IPset file. By default, rwsetcat prints individual IPs for IPv4 IPsets, and
CIDR blocks for IPv6 IPsets. See also the --ip-ranges switch. This switch cannot be combined with
the --network-structure switch.
--network-structure
--network-structure=STRUCTURE
For each numeric value in STRUCTURE, group the IPs in the IPset into a netblock of that size and
print the number of hosts and, optionally, print the number of smaller, occupied netblocks that each
larger netblock contains. When STRUCTURE begins with v6:, the IPs in the IPset are treated as
IPv6 addresses, and any IPv4 addresses are mapped into the ::ffff:0:0/96 netblock. Otherwise, the IPs
are treated as IPv4 addresses, and any IPv6 address outside the ::ffff:0:0/96 netblock is ignored. Aside
from the initial v6: (or v4:, for consistency), STRUCTURE has one of following forms:
1. NETBLOCK LIST /SUMMARY LIST. Group IPs into the sizes specified in either NETBLOCK LIST or SUMMARY LIST. rwsetcat prints a row for each occupied netblock specified in
NETBLOCK LIST, where the row lists the base IP of the netblock, the number of hosts, and the
number of smaller, occupied netblocks having a size that appears in either NETBLOCK LIST or
SUMMARY LIST. (The values in SUMMARY LIST are only summarized; they are not printed.)
2. NETBLOCK LIST /. Similar to the first form, except all occupied netblocks are printed, and
there are no netblocks that are only summarized.
3. NETBLOCK LIST S. When the character S appears anywhere in the NETBLOCK LIST, rwsetcat provides a default value for the SUMMARY LIST. That default is 8,16,24,27 for IPv4, and
48,64 for IPv6.
4. NETBLOCK LIST. When neither S nor / appear in STRUCTURE, the output does not include
the number of smaller, occupied netblocks.
5. Empty. When STRUCTURE is empty or only contains v6: or v4:, the NETBLOCK LIST prints
a single row for the total network (the /0 netblock) giving the number of hosts and the number
of smaller, occupied netblocks using the same default list specified in form 3.
NETBLOCK LIST and SUMMARY LIST contain a comma separated list of numbers between 0 (the
total network) and the size for an individual host (32 for IPv4 or 128 for IPv6). The characters T and H
may be used as aliases for 0 and the host netblock, respectively. In addition, when parsing the lists as
IPv4 netblocks, the characters A, B, C, and X are supported as aliases for 8, 16, 24, and 27, respectively.
A comma is not required between adjacent letters. The --network-structure switch disables printing
of the IPs in the IPset file; specify the H argument to the switch to print each individual IP address.
December 18, 2014
265
rwsetcat(1)
The SiLK Reference Guide
--ip-ranges
Cause the output to contain three pipe-delimited (|) columns: the first is the number of IPs in the
contiguous range, the second is the start of the range, and the final is the end of the range. This prints
the IPset in the fewest number of lines.
--ip-format=FORMAT
Specify how IP addresses will be printed. When this switch is not specified, IPs are printed in the
canonical format. The FORMAT is one of:
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively.
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release.
--no-columns
Disable fixed-width columnar output when printing the output from the --network-structure or
--ip-ranges switch.
--column-separator=C
Use specified character between columns produced by the --network-structure and --ip-ranges
switches. This character is also used after the final column when --ip-ranges is specified. When this
switch is not specified, the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column in the output produced by --ip-ranges.
Normally a delimiter is printed.
266
December 18, 2014
The SiLK Reference Guide
rwsetcat(1)
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--print-filenames
--print-filenames=0
--print-filenames=1
If an argument is not provided to the switch or if the argument is 1, print the name of the IPset
file prior to printing information about the IPset file regardless of the number of IPset files specified
on the command line or the type of information to be printed. If the switch is provided and its
argument is 0, suppress printing the name of the IPset file regardless of the number of IPset files or
type of information. When the switch is not provided, rwsetcat’s behavior depends on the type of
information to be printed and on the number of input IPset files: If multiple IPset files are provided
and --count-ips or --print-statistics is given, rwsetcat prints the name of a file, a colon (:), a
newline (unless --count-ips was specified), and the requested information; otherwise, rwsetcat does
not print the file name.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. Some input lines are split over
multiple lines in order to improve readability, and a backslash (\) is used to indicate such lines.
By default, rwsetcat prints the contents of an IPset.
$ rwsetcat sample.set
10.1.2.250
10.1.2.251
10.1.2.252
10.1.2.253
10.1.2.254
10.1.2.255
10.1.3.0
10.1.3.1
10.1.3.2
10.1.3.3
10.1.3.4
December 18, 2014
267
rwsetcat(1)
The SiLK Reference Guide
Use the --cidr-blocks switch to print the contents in CIDR notation.
$ rwsetcat --cidr-blocks sample.set
10.1.2.250/31
10.1.2.252/30
10.1.3.0/30
10.1.3.4
rwsetcat will read the IPset file from the standard input when no file name is given on the command line.
$ cat sample.set | rwsetcat --cidr-blocks
10.1.2.250/31
10.1.2.252/30
10.1.3.0/30
10.1.3.4
When multiple IPset files are specified on the command line, rwsetcat prints the contents of each file one
after the other.
$ rwsetcat --cidr-blocks sample.set sample.set
10.1.2.250/31
10.1.2.252/30
10.1.3.0/30
10.1.3.4
10.1.2.250/31
10.1.2.252/30
10.1.3.0/30
10.1.3.4
To print the union of multiple the IPset files, use rwsettool(1) to join the files and have rwsetcat print
the result.
$ rwsettool --union sample.set sample.set | rwsetcat --cidr-blocks
10.1.2.250/31
10.1.2.252/30
10.1.3.0/30
10.1.3.4
To see contiguous IPs printed as ranges, use the --ip-ranges switch. The columns contain the length of the
range, its starting IP, and its ending IP.
$ rwsetcat --ip-ranges sample.set
11|
10.1.2.250|
10.1.3.4|
Add the --ip-format=decimal switch to see contiguous IPs printed as ranges of integers.
$ rwsetcat --ip-ranges --ip-format=decimal sample.set
11| 167838458| 167838468|
268
December 18, 2014
The SiLK Reference Guide
rwsetcat(1)
Use the --delimited switch to produce the same output as a list of comma separated values.
$ rwsetcat --ip-ranges --ip-format=decimal --delimited=, sample.set
11,167838458,167838468
The UNIX cut(1) tool can be used to remove the number of IPs in the range, so that the output only
contains the starting and ending IPs.
$ rwsetcat --ip-ranges --ip-format=decimal --delimited=, sample.set \
| cut -d"," -f2,3
167838458,167838468
The --count-ips switch will print the number IPs in the IPset.
$ rwsetcat --count-ips sample.set
11
When counting the IPs in multiple IPset files, rwsetcat prepends the file name and a colon to the count.
(The - argument causes rwsetcat to read the standard input in addition to the named file.)
$ cat sample.set | rwsetcat --count-ips sample.set sample.set:11
-:11
Provide an argument of 0 to --print-filenames to suppress printing of the input IPset file name.
$ cat sample.set \
| rwsetcat --count-ips --print-filenames=0 sample.set 11
11
Use the --print-filenames switch to force rwsetcat to print the file name when only one IPset is given.
$ rwsetcat --count-ips --print-filenames sample.set
sample.set:11
The --print-filenames switch also causes rwsetcat to print the file name when it normally would not.
$ rwsetcat --ip-ranges --ip-format=decimal --print-filenames sample.set
sample.set:
11| 167838458| 167838468|
To see the contents of the IPset and get a count of IPs, use multiple options.
$ rwsetcat --count-ips --cidr-blocks sample.set
11
10.1.2.250/31
10.1.2.252/30
10.1.3.0/30
10.1.3.4
December 18, 2014
269
rwsetcat(1)
The SiLK Reference Guide
For text-based sorting, use the --ip-format=zero-padded switch to force three digits per octet.
$ rwsetcat --ip-format=zero-padded --cidr-blocks sample.set
010.001.002.250/31
010.001.002.252/30
010.001.003.000/30
010.001.003.004
For numerical sorting, print the IPs as integers.
$ rwsetcat --ip-format=decimal sample.set
167838458
167838459
167838460
167838461
167838462
167838463
167838464
167838465
167838466
167838467
167838468
Use --print-statistics to get a summary of the IPset file.
$ rwsetcat --print-statistics --print-filenames sample.set
sample.set:
Network Summary
minimumIP = 10.1.2.250
maximumIP = 10.1.3.4
11 hosts (/32s),
0.000000% of 2^32
1 occupied /8,
0.390625% of 2^8
1 occupied /16,
0.001526% of 2^16
2 occupied /24s,
0.000012% of 2^24
2 occupied /27s,
0.000001% of 2^27
The --network-structure switch ”rolls-up” the IPs into larger blocks.
$ rwsetcat --network-structure=TABCXS sample.set
10.1.2.224/27
| 6 hosts
10.1.2.0/24
| 6 hosts in 1 /27
10.1.3.0/27
| 5 hosts
10.1.3.0/24
| 5 hosts in 1 /27
10.1.0.0/16
| 11 hosts in 2 /24s and 2 /27s
10.0.0.0/8
| 11 hosts in 1 /16, 2 /24s, and 2 /27s
TOTAL
| 11 hosts in 1 /8, 1 /16, 2 /24s, and 2 /27s
You may specify arbitrary blocks for the --network-structure switch.
270
December 18, 2014
The SiLK Reference Guide
rwsetcat(1)
$ rwsetcat --network-structure=23,24 sample.set
10.1.2.0/24
| 6
10.1.3.0/24
| 5
10.1.2.0/23
| 11
$ rwsetcat --network=23,24/24 sample.set
10.1.2.0/24
| 6 hosts
10.1.3.0/24
| 5 hosts
10.1.2.0/23
| 11 hosts in 2 /24s
$ rwsetcat --network=T,23/24 sample.set
10.1.2.0/23
| 11 hosts in 2 /24s
TOTAL
| 11 hosts in 1 /23 and 2 /24s
To see the IPs generated by rwset(1) without creating an intermediate IPset file, have rwset send its
output to the standard output, and have rwsetcat read from the standard input.
$ rwfilter ... --pass=stdout | rwset --sip=stdout | rwsetcat
192.168.1.1
192.168.1.2
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwsetcat automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwsetcat does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwsetcat automatically invokes this program to display its
output a screen at a time.
SEE ALSO
rwset(1), rwsetbuild(1), rwsettool(1), rwsetmember(1), rwbagtool(1), silk(7), cut(1)
December 18, 2014
271
rwsetmember(1)
The SiLK Reference Guide
rwsetmember
Determine whether IP address(es) are members of an IPset
SYNOPSIS
rwsetmember [--count] [--quiet] PATTERN [INPUT_SET [INPUT_SET...]]
rwsetmember --help
rwsetmember --version
DESCRIPTION
rwsetmember determines whether an IP address or pattern exists in one or more IPset files, printing the
name of the IPset files that contain the IP and optionally counting the number of matches in each file.
PATTERN can be a single IP address, a CIDR block, or an IP Wildcard expressed in the same form as
accepted by rwsetbuild(1).
If an INPUT SET is not given on the command line, rwsetmember will attempt to read an IPset from the
standard input. To read the standard input in addition to the named files, use - or stdin as a file name. If
an input file name ends in .gz, the file will be uncompressed as it is read.
When rwsetmember encounters an INPUT SET file that it cannot read as an IPset, it prints an error
message and moves to the next INPUT SET file.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--count
Follow each set filename by a colon character and the number of pattern matches in the IPset. Files
that do not match will still be printed, but with a zero match count. The --count switch is ignored
when --quiet is also specified.
--quiet
Produce no standard output. The exit status of the program (see below) should be checked to determine
whether any files matched.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
272
December 18, 2014
The SiLK Reference Guide
rwsetmember(1)
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
To quickly check whether a single set file contains an address (check the exit status):
$ rwsetmember --quiet 192.168.1.1 file.set
To display which of several set files (if any) match a given IP address:
$ rwsetmember 192.168.1.1 *.set
To display the same, but with counts from each file:
$ rwsetmember --count 192.168.1.1 *.set
To find all sets that contain addresses in the 10.0.0.0/8 subnet:
$ rwsetmember 10.0.0.0/8 *.set
To find files containing any IP address that ends with a number between 1 and 10 (this will use a lot of
memory):
$ rwsetmember x.x.x.1-10 *.set
EXIT STATUS
rwsetmember exits with status code 0 if any file matched the pattern or 1 if there were no matches across
any files or if there was a fatal error with the input.
SEE ALSO
rwset(1), rwsetbuild(1), rwsetcat(1), silk(7)
December 18, 2014
273
rwsettool(1)
The SiLK Reference Guide
rwsettool
Operate on IPset files to produce a new IPset
SYNOPSIS
rwsettool { --union | --intersect | --difference
| --mask=NET_BLOCK_SIZE | --fill-blocks=NET_BLOCK_SIZE
| --sample {--size=SIZE | --ratio=RATIO} [--seed=SEED] }
[--output-path=OUTPUT_PATH] [--record-version=VERSION]
[--invocation-strip]
[--note-strip] [--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [INPUT_SET ...]
rwsettool --help
rwsettool --version
DESCRIPTION
rwsettool performs a single operation on one or more IPset file(s) to produce a new IPset file. The operations
that rwsettool provides are union, intersection, difference, masking, and sampling. Details are provided in
the OPTIONS section.
rwsettool reads the IPsets specified on the command line; when no IPsets are listed, rwsettool attempts
to read an IPset from the standard input. The strings stdin or - can be used as the name of an input file
to force rwsettool to read from the standard input. The resulting IPset is written to the location specified
by the --output-path switch or to the standard output if that switch is not provided. Using the strings
stdout or - as the argument to --output-path causes rwsettool to write the IPset to the standard output.
rwsettool will exit with an error if an attempt is made to read an IPset from the terminal or write an IPset
to the terminal.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
Operation Switches
At least one of the following operation switches must be provided:
--union
Perform the set union operation: The resulting IPset will contain an IP if that IP was present in any
of the input IPsets.
--intersect
Perform the set intersection operation: The resulting IPset will contain an IP if that IP was present
in all of the input IPsets.
274
December 18, 2014
The SiLK Reference Guide
rwsettool(1)
--difference
Perform the set difference (relative complement) operation: The resulting IPset will contain an IP if
that IP was present in first IPset and not present in any of the subsequent IPsets.
--mask=NET BLOCK SIZE
Perform a (sparse) masking operation: The least significant 32-NET BLOCK SIZE or 128NET BLOCK SIZE bits of each IP in every input IPset is set to zero, and the resulting IPset
contains the union of these IPs. That is, the result contains one IP for each CIDR block of size
NET BLOCK SIZE. NET BLOCK SIZE should be value between 1 and 32 for IPv4 sets, and between
1 and 128 for IPv6 sets. Contrast with --fill-blocks.
--fill-blocks=NET BLOCK SIZE
Perform a masking operation that produces completely full blocks: The least significant 32NET BLOCK SIZE or 128-NET BLOCK SIZE bits of each IP in every input IPset is set to zero.
To create the output, each IP is modified to be a completely full NET BLOCK SIZE CIDR block.
--sample
Select a random sample of IPs from the input IPsets. The size of the subset must be specified by either
the --size or --ratio switches described below. In the case of multiple input IPsets, the resulting IPset
is the union of all IP addresses sampled from each of the input IPsets.
Sampling Switches
These switches control how records are sampled by the --sample operation.
--size=SIZE
Select a random sample containing SIZE randomly selected records from each input IPset. If the input
set is smaller than SIZE, all input IPs will be selected from that IPset.
--ratio=RATIO
Select a random sample where the selection probability for each record of each input set is RATIO,
specified as a decimal number between 0.0 and 1.0. The exact size of the subset selected from each file
will vary between different runs with the same data.
--seed=SEED
Seed the pseudo-random number generator with value SEED. By default, the seed will vary between
runs. Seeding with specific values will produce repeatable results given the same input sets.
Output Switches
These switches control the output:
--output-path=OUTPUT PATH
Write the resulting IPset to OUTPUT PATH. If this switch is not provided, rwsettool will attempt
to write the IPset to the standard output, unless it is connected to a terminal.
--record-version=VERSION
Specify the format of the IPset records that are written to the output. Valid values are 0, 2, 3, and
4. When the switch is not provided, the SILK IPSET RECORD VERSION environment variable is
December 18, 2014
275
rwsettool(1)
The SiLK Reference Guide
checked for a version. A VERSION of 2 creates a file compatible with SiLK 2.x, and it can only be
used for IPsets containing IPv4 addresses. A VERSION of 3 creates a file that can only be read by
SiLK 3.0 or later. A VERSION of 4 creates a file that can only be read by SiLK 3.7 or later. Version 4
files are smaller than version 3 files. The default VERSION is 0, which uses version 2 for IPv4 IPsets
and version 3 for IPv6 IPsets.
--invocation-strip
Do not record any command line history; that is, do not copy the invocation history from the input
files to the output file, and do not record the current command line invocation in the output.
--note-strip
Do not copy the notes (annotations) from the input files to the output file. Normally notes from the
input files are copied to the output.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
276
December 18, 2014
The SiLK Reference Guide
rwsettool(1)
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
Assume the following IPsets:
A.set
B.set
C.set
D.set
=
=
=
=
{
{
{
{
1, 2, 4,
1, 3, 5,
1, 3, 6,
} (empty
6 }
7 }
8 }
set)
Then the following commands will produce the following result IPsets:
+---------------------------------+----------------------------+
| OPTIONS
| RESULT
|
+---------------------------------+----------------------------+
| --union A.set B.set
| { 1, 2, 3, 4, 5, 6, 7 }
|
| --union A.set C.set
| { 1, 2, 3, 4, 6, 8 }
|
| --union A.set B.set C.set
| { 1, 2, 3, 4, 5, 6, 7, 8 } |
| --union C.set D.set
| { 1, 3, 6, 8 }
|
| --intersect A.set B.set
| { 1 }
|
| --intersect A.set C.set
| { 1, 6 }
|
| --intersect A.set B.set C.set
| { 1 }
|
| --intersect A.set D.set
| { }
|
| --difference A.set B.set
| { 2, 4, 6 }
|
| --difference B.set A.set
| { 3, 5, 7 }
|
| --difference A.set B.set C.set | { 2, 4 }
|
| --difference C.set B.set A.set | { 8 }
|
| --difference C.set D.set
| { 1, 3, 6, 8 }
|
| --difference D.set C.set
| { }
|
+---------------------------------+----------------------------+
Sampling yields variable results, but here some example runs:
+---------------------------------+----------------------------+
| COMMAND
| RESULT
|
+---------------------------------+----------------------------+
| --sample -size 2 A.set
| { 1, 4 }
|
| --sample -size 2 A.set
| { 1, 6 }
|
| --sample -size 3 A.set
| { 2, 4, 6 }
|
| --sample -size 2 A.set B.set
| { 1, 2, 5, 7 }
|
| --sample -size 2 A.set B.set
| { 3, 4, 5, 6 }
|
| --sample -size 2 A.set B.set
| { 1, 4, 5 }
|
| --sample -ratio 0.5 A.set
| { 2, 6 }
|
| --sample -ratio 0.5 A.set
| { 4 }
|
| --sample -ratio 0.5 A.set B.set | { 1 }
|
| --sample -ratio 0.5 A.set B.set | { 2, 3, 5, 6, 7 }
|
+---------------------------------+----------------------------+
These examples demonstrate some important points about sampling from IPsets:
December 18, 2014
277
rwsettool(1)
The SiLK Reference Guide
• When using --size, an exact number of items is selected from each input set.
• When using --size with multiple input sets, the number of records in the output set may not be
(num input sets*size) in all cases.
• When using --ratio, the number of items sampled is not stable between runs.
Given an IPset containing the three IPs
10.1.1.1
10.1.1.2
10.1.3.1
specifying --mask=24 will produce an IPset containing two IPs:
10.1.1.0
10.1.3.0
while specifying --fill-blocks=24 will produce an IPset containing 512 IPs:
10.1.1.0/24
10.1.3.0/24
Suppose the IPset file mixed.set contains IPv4 and IPv6 addresses. To create an IPset file that contains only
the IPv4 addresses, intersect mixed.set with an IPset that contains ::ffff:0:0/96.
$ echo ’::ffff:0:0/96’ | rwsetbuild - all-v4.set
$ rwsettool --intersect mixed.set all-v4.set > subset-v4.set
To create an IPset file that contains only the IPv6 addresses, subtract an IPset that contains ::ffff:0:0 from
mixed.set:
$ rwsettool --difference mixed.set all-v4.set > subset-v6.set
ENVIRONMENT
SILK IPSET RECORD VERSION
This environment variable is used as the value for the --record-version when that switch is not
provided.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SEE ALSO
rwset(1), rwsetbuild(1), rwsetcat(1), rwfileinfo(1), silk(7), zlib(3)
278
December 18, 2014
The SiLK Reference Guide
rwsettool(1)
NOTES
The --record-version switch was added in SiLK 3.0. Prior to SiLK 3.6, the only supported arguments for
the switch were 2 and 3, with the default being 3. As of SiLK 3.6, the default is 0. Version 4 was added in
SiLK 3.7.
December 18, 2014
279
rwsilk2ipfix(1)
The SiLK Reference Guide
rwsilk2ipfix
Convert SiLK Flow records to IPFIX records
SYNOPSIS
rwsilk2ipfix [--ipfix-output=FILE] [--print-statistics]
[--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwsilk2ipfix --help
rwsilk2ipfix --version
DESCRIPTION
rwsilk2ipfix reads SiLK Flow records, converts the records to an IPFIX (Internet Protocol Flow Information
eXport) format, and writes the IPFIX records to the path specified by --ipfix-output or to the standard
output when stdout is not the terminal and --ipfix-output is not provided.
rwsilk2ipfix reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed
as it is read. When the --xargs switch is provided, rwsilk2ipfix will read the names of the files to process
from the named text file, or from the standard input if no file name argument is provided to the switch. The
input to --xargs must contain one file name per line.
The IPFIX records generated by rwsilk2ipfix will contain six information elements that are in the Private
Enterprise space for CERT (the IPFIX Private Enterprise Number of CERT is 6871). These six information
elements fall into two groups:
• Elements 30 and 31 contain the packing information that was determined by rwflowpack(8), specifically the flowtype and the sensor. These values correspond to numbers specified in the silk.conf(5)
file.
• Elements 14, 15, 32, and 33 contain information elements generated by the yaf(1) flow meter (http:
//tools.netsa.cert.org/yaf/). The information elements will be present even if yaf was not used to
generate the flow records, but their value will be empty or 0.
For each of the six information elements that rwsilk2ipfix will produce, the following table lists its numeric
ID, its length in octets, its name, the field name it corresponds to on rwcut(1), and a brief description.
30
1
silkFlowType
class & type
31
2
silkFlowSensor
sensor
14
1
initialTCPFlags
initialFlags
15
1
unionTCPFlags
sessionFlags
280
How rwflowpack categorized
the flow record
Sensor where the flow was
collected
TCP flags on first packet in
the flow record
TCP flags on all packets in
the flow except the first
December 18, 2014
The SiLK Reference Guide
rwsilk2ipfix(1)
32
1
silkTCPState
attributes
33
2
silkAppLabel
application
Flow continuation attributes
set by generator
Guess by flow generator as
to the content of traffic
The IPFIX template that rwsilk2ipfix writes contains the following information elements:
OCTETS
=======
0- 7
8- 15
16- 31
32- 47
48- 51
52- 55
56- 57
58- 59
60- 63
64- 79
80- 83
84- 87
88- 95
96-103
104
105
106-107
108
109
110
111
112-113
114-119
INFORMATION ELEMENT (PEN, ID)
=============================
flowStartMilliseconds (152)
flowEndMilliseconds (153)
sourceIPv6Address (27)
destinationIPv6Address (30)
sourceIPv4Address (8)
destinationIPv4Address (12)
sourceTransportPort (7)
destinationTransportPort (11)
ipNextHopIPv4Address (15)
ipNextHopIPv6Address (62)
ingressInterface (10)
egressInterface (14)
packetDeltaCount (2)
octetDeltaCount (1)
protocolIdentifier (4)
silkFlowType (6871, 30)
silkFlowSensor (6871, 31)
tcpControlBits (6)
initialTCPFlags (6871, 14)
unionTCPFlags (6871, 15)
silkTCPState (6871, 32)
silkAppLabel (6871, 33)
paddingOctets (210)
SILK FIELD
=============
sTime
sTime + duration
sIP
dIP
sIP
dIP
sPort
dPort
nhIP
nhIP
n
out
packets
bytes
protocol
class & type
sensor
flags
initialFlags
sessionFlags
attributes
application
-
Note that the template contains both IPv4 and IPv6 addresses. One set of those addresses contains the IP
addresses and the other set contains only zeros.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--ipfix-output=FILE
Write the IPFIX records to FILE, which must not exist. If the switch is not provided or if FILE has
the value stdout, the IPFIX flows are written to the standard output.
--print-statistics
Print, to the standard error, the number of records that were written to the IPFIX output file.
December 18, 2014
281
rwsilk2ipfix(1)
The SiLK Reference Guide
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwsilk2ipfix searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwsilk2ipfix to read file names from FILENAME or from the standard input if FILENAME
is not provided. The input should have one file name per line. rwsilk2ipfix will open each file in turn
and read records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
To convert the SiLK file silk.rw into an IPFIX format and store the results in ipfix.dat:
$ rwsilk2ipfix --ipfix-output=ipfix.dat silk.rw
To view the contents of ipfix.dat using the yafscii(1) tool (see http://tools.netsa.cert.org/yaf/):
$ yafscii --in=ipfix.dat --out=Use the rwipfix2silk(1) tool to convert the IPFIX file back into SiLK Flow format:
$ rwipfix2silk --silk-output=silk2.rw ipfix.dat
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwsilk2ipfix may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwsilk2ipfix may use this environment variable. See the FILES section for details.
282
December 18, 2014
The SiLK Reference Guide
rwsilk2ipfix(1)
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwipfix2silk(1), rwcut(1), rwflowpack(8), silk.conf(5), sensor.conf(5), silk(7), yaf(1), yafscii(1)
December 18, 2014
283
rwsiteinfo(1)
The SiLK Reference Guide
rwsiteinfo
Print information from the silk.conf site configuration file
SYNOPSIS
rwsiteinfo --fields=FIELD[,FIELD...]
{ [--classes=CLASS[,CLASS...]] [--types=TYPE[,TYPE...]]
| [--flowtypes=CLASS/TYPE[,CLASS/TYPE...]] }
[--sensors=SENSOR[,SENSOR...]]
[--data-rootdir=ROOT_DIRECTORY] [--site-config-file=FILENAME]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--list-delimiter=CHAR] [--pager=PAGER_PROG]
rwsiteinfo --help
rwsiteinfo --version
DESCRIPTION
rwsiteinfo is a utility to print selected information about the classes, types, flowtypes, and sensors that
are defined in the silk.conf(5) site configuration file. The --fields switch is required, and its argument is a
comma-separated list of field names selecting the fields to be printed. The output from rwsiteinfo consists
of multiple columns and rows, where each column contains one of the FIELDs and where each row has
a unique value for one of the FIELDs. rwsiteinfo prints rows until all possible combinations of fields is
exhausted. By default, the information is printed in a columnar, bar (|) delimited format.
The --classes, --types, --flowtypes, and --sensors switches allow the user to limit the amount of information printed. (These switches operate similarly to their namesakes on rwfilter(1) and rwfglob(1).) If
none of these switches are given, rwsiteinfo prints information for all values defined in the silk.conf file.
If one or more of these switches is specified, rwsiteinfo limits its output to the specified values. To print
information about the default class or the default types within a class, use the at-sign (@) as the name of
the class or type, respectively. The --flowtypes switch must be used independently of the --classes and
--types switches.
As stated above, rwsiteinfo prints unique rows given a list of FIELDs. As an example, suppose the user
entered the command rwsiteinfo --fields=class,type,sensor. rwsiteinfo will print a row containing
the first class defined in the silk.conf file, the first type defined for that class, and the first sensor name
defined for that class/type pair. On the next row, the class and type will be the same and the second sensor
name will be printed. Once all sensors have been printed, rwsiteinfo repeats the process for the second type
defined for the first class, and so on. Once all information for the first class has been printed, the process
would repeat for the next class, until all classes have been printed.
The order of the FIELDs determines how rwsiteinfo iterates through the possible values. The last FIELD
will change most rapidly, and the first field will change most slowly. Two invocations of rwsiteinfo where
the first specifies --fields=class,sensor and the second specifies --fields=sensor,class produce the
same number of rows, and each invocation has an outer and inner iterator. In the first invocation, the outer
iterator is over the classes, and the inner iterator is over each sensor defined in that class. In the second
invocation, the outer iterator is over the sensors, and the inner is over the classes to which that sensor
belongs.
284
December 18, 2014
The SiLK Reference Guide
rwsiteinfo(1)
In general, the output will contain some combination of class, type, flowtype, and sensor. For flowtype and
sensor, the numeric ID may be printed instead of the name. For class and type, the default values may be
printed or they may be identified by a symbol. Most field names support a FIELD:list variant that puts
all possible values for that field into a single column. See the description of the --fields switch below for
details.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--fields=FIELD[,FIELD...]
Specify the fields to print as a comma-separated list of names. The names are case-insensitive. The
fields will be displayed in the order the names are specified. The --fields switch is required, and
rwsiteinfo will fail when it is not provided.
The list of possible field names is:
class
the class name, e.g., all
type
the type name, e.g., inweb
flowtype
the flowtype name, e.g., iw. The flowtype name is a combination of the class name and type
name, and it is used to name files in the SiLK data repository.
id-flowtype
the integer identifier for the flowtype, e.g., 2
sensor
the sensor name, e.g., S3
id-sensor
the integer identifier for the sensor, e.g., 3
describe-sensor
the sensor description, when present
default-class
the default class name
default-type
the default type name
mark-defaults
a two-character wide column that contains a plus ’+’ on a row that contains the default class and
an asterisk ’*’ on a row that contains a default type
class:list
instead of printing class names on separate rows, join all the classes in a single row separated
using the list-delimiter
type:list
instead of printing type names on separate rows, join all the types in a single row separated using
the list-delimiter
December 18, 2014
285
rwsiteinfo(1)
The SiLK Reference Guide
flowtype:list
instead of printing flowtype names on separate rows, join all the flowtypes in a single row separated
using the list-delimiter
id-flowtype:list
instead of printing flowtype identifiers on separate rows, join all the flowtype identifiers in a single
row separated using the list-delimiter
sensor:list
instead of printing sensor names on separate rows, join all the sensors in a single row separated
using the list-delimiter
id-sensor:list
instead of printing sensor identifiers on separate rows, join all the sensor identifiers in a single row
separated using the list-delimiter
default-class:list
equivalent to default-class, but provided for consistency
default-type:list
instead of printing the default type names on separate rows, join all the default type names in a
single row separated using the list-delimiter
--classes=CLASS [,CLASS...]
Restrict the output using the class(es) named in the comma-separated list. The default class may be
specified by using an at-sign (@) as the name of a class.
--types=TYPE [,TYPE...]
Restrict the output using the type(s) named in the comma-separated list. The default types for a class
may be specified by using an at-sign (@) as the name of a type.
--flowtypes=CLASS /TYPE [,CLASS /TYPE...]
Restrict the output using the class/type pairs named in the comma-separated list, where the class
name and type name are separated by a slash (/). The keyword all may be used for the CLASS
and/or TYPE to select all classes and/or types.
--sensors=SENSOR[,SENSOR...]
Restrict the output to the sensors(s) named in the comma-separated list of sensor names, sensor IDs
(integers), and/or ranges of sensor IDs.
--data-rootdir=ROOT DIRECTORY
Use ROOT DIRECTORY as the root of the data repository, which overrides the location given in the
SILK DATA ROOTDIR environment variable, which in turn overrides the location that was compiled
into rwsiteinfo (/data). This directory is one of the locations where rwsiteinfo attempts to find the
silk.conf file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwsiteinfo searches for the site configuration file in the locations specified in the FILES section.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
286
December 18, 2014
The SiLK Reference Guide
rwsiteinfo(1)
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of | is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default |.
--list-delimiter=C
Specify the character to use between items that comprise a FIELD:list column. The default list
delimiter is comma ,.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--help
Print the available options and exit. Options that add fields can be specified before --help so that the
new options appear in the output.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
The following prints all known sensor names, one name per line:
$ rwsiteinfo --fields=sensors --no-titles --delimited
The following prints all known sensor names on a single line (the names will be separated by comma):
$ rwsiteinfo --fields=sensors:list --no-titles --delimited
This changes the separator of the sensor names to a space:
$ rwsiteinfo --fields=sensors:list --no-titles --delimited \
--list-delimiter=’ ’
December 18, 2014
287
rwsiteinfo(1)
The SiLK Reference Guide
The following prints the sensor names for the default class on a single line:
$ rwsiteinfo --fields=sensors:list [email protected] --no-titles --delimited
The following prints four columns: (1) the sensor identifier, (2) the sensor name, (3) the list of classes for
that sensor, and (4) a description of the sensor (that is, it mimics the output of mapsid(1)):
$ rwsiteinfo --fields=sensor,id-sensor,class:list,describe-sensor
The following prints two columns, the first containing a class name and the second the list of default types
for that class:
$ rwsiteinfo --fields=class,default-types:list
The following prints the default class:
$ rwsiteinfo --fields=default-class --no-titles --delimited
As does this:
$ rwsiteinfo --fields=class [email protected] --no-titles --delimited
The following prints the default types for the default class, with each type on a separate line:
$ rwsiteinfo --fields=default-type [email protected] --no-titles --delimited
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwsiteinfo automatically invokes this program to display its output
a screen at a time. If set to an empty string, rwsiteinfo does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwsiteinfo automatically invokes this program to display its
output a screen at a time.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwsiteinfo may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwsiteinfo may use this environment variable. See the FILES section for details.
288
December 18, 2014
The SiLK Reference Guide
rwsiteinfo(1)
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided. The location of the SILK DATA ROOTDIR may be specified using the --rootdirectory switch.
NOTES
rwsiteinfo was added in SiLK 3.0.
rwsiteinfo duplicates the functionality found in mapsid(1). mapsid is deprecated, and it will be removed
in the SiLK 4.0 release. Examples of using rwsiteinfo in place of mapsid are provided in the latter’s
manual page.
SEE ALSO
silk.conf(5), mapsid(1), rwfilter(1), rwfglob(1), silk(7)
December 18, 2014
289
rwsort(1)
The SiLK Reference Guide
rwsort
Sort SiLK Flow records on one or more fields
SYNOPSIS
rwsort --fields=KEY [--presorted-input] [--reverse]
[--temp-directory=DIR_PATH] [--sort-buffer-size=SIZE]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD] [--print-filenames]
[--output-path=PATH] [--site-config-file=FILENAME]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
{[--input-pipe=PATH] | [--xargs]|[--xargs=FILE] | [FILES...]}
rwsort [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwsort [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwsort --version
DESCRIPTION
rwsort reads SiLK Flow records, sorts the records by the field(s) listed in the --fields switch, and writes
the records to the --output-path or to the standard output if it is not connected to a terminal. The output
from rwsort is binary SiLK Flow records; the output must be passed into another tool for human-readable
output.
Sorting records is an expensive operation, and it should only be used when necessary. The tools that bin
flow records (rwcount(1), rwuniq(1), rwstats(1), etc) do not require sorted data.
rwsort reads SiLK Flow records from the files named on the command line or from the standard input when
no file names are specified and neither --xargs nor --input-pipe is present. To read the standard input in
addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be
uncompressed as it is read. When the --xargs switch is provided, rwsort will read the names of the files
to process from the named text file, or from the standard input if no file name argument is provided to the
switch. The input to --xargs must contain one file name per line. The --input-pipe switch is deprecated
and it is provided for legacy reasons; its use is not required since rwsort will automatically read form the
standard input. The --input-pipe switch will be removed in the SiLK 4.0 release.
The amount of fast memory used by rwsort will increase until it reaches a maximum near 2GB. (Use the
--sort-buffer-size switch to change this upper limit on the buffer size.) If more records are read than will
fit into memory, the in-core records are sorted and temporarily stored on disk as described by the --tempdirectory switch. When all records have been read, the on-disk files are merged and the sorted records
written to the output.
By default, the temporary files are stored in the /tmp directory. Because these temporary files will be
large, it is strongly recommended that /tmp not be used as the temporary directory. To modify the tempo290
December 18, 2014
The SiLK Reference Guide
rwsort(1)
rary directory used by rwsort, provide the --temp-directory switch, set the SILK TMPDIR environment
variable, or set the TMPDIR environment variable.
To merge previously sorted SiLK data files into a sorted stream, run rwsort with the --presorted-input
switch. rwsort will merge-sort all the input files, reducing it’s memory requirements considerably. It is the
user’s responsibility to ensure that all the input files have been sorted with the same --fields value (and
--reverse if applicable). rwsort may still require use of a temporary directory while merging the files (for
example, if rwsort does not have enough available file handles to open all the input files at once).
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
The --fields switch is required. rwsort will fail when it is not provided.
--fields=KEY
KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key by which
flows are sorted. The fields are in listed in order from primary sort key, secondary key, etc. Each field
may be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges
of field-integers; a range is specified by separating the start and end of the range with a hyphen (-).
Field-names are case insensitive. Example:
--fields=stime,10,1-5
There is no default value for the --fields switch; the switch must be specified.
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all
fields are present in all SiLK file formats; when a field is not present, its value is 0.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent. See note at iType.
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sTime,9,sTime+msec,22
starting time of flow (milliseconds resolution)
December 18, 2014
291
rwsort(1)
The SiLK Reference Guide
duration,10,dur+msec,24
duration of flow (milliseconds resolution)
eTime,11,eTime+msec,23
end time of flow (milliseconds resolution)
sensor,12
name or ID of sensor where flow was collected
class,20,type,21
integer value of the class/type pair assigned to the flow by rwflowpack(8)
iType
the ICMP type value for ICMP or ICMPv6 flows and zero for non-ICMP flows. Internally, SiLK
stores the ICMP type and code in the dPort field, so there is no need have both dPort and iType
or iCode in the sort key. This field was introduced in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and zero for non-ICMP flows. See note at iType.
icmpTypeCode,25
equivalent to iType,iCode. This field may not be mixed with iType or iCode, and this field is
deprecated as of SiLK 3.8.1. Prior to SiLK 3.8.1, specifying the icmpTypeCode field was equivalent
to specifying the dPort field.
Many SiLK file formats do not store the following fields and their values will always be 0; they are
listed here for completeness:
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
nhIP,15
router next hop IP
SiLK can store flows generated by enhanced collection software that provides more information than
NetFlow v5. These flows may support some or all of these additional fields; for flows without this
additional information, the field’s value is always 0.
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by the flow generator:
S
all the packets in this flow record are exactly the same size
F
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding
ACK packets)
292
December 18, 2014
The SiLK Reference Guide
rwsort(1)
T
flow generator prematurely created a record for a long-running connection due to a timeout.
(When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a
flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
C
flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout (or a byte threshold in the case of yaf ).
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the
active timeout since the flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each spanning some portion of
the total session. The first flow record will be marked with a T indicating that it hit the timeout.
The second through next-to-last records will be marked with TC indicating that this flow both
timed out and is a continuation of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures
to label the content of the flow. SiLK calls this label the application; yaf refers to it as the
appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external
files to provide the mapping from the IP or port to the label:
sType,16
categorize the source IP address as non-routable, internal, or external and sort based on the
category. Uses the mapping file specified by the SILK ADDRESS TYPES environment variable,
or the address types.pmap mapping file, as described in addrtype(3).
dType,17
as sType for the destination IP address
scc,18
the country code of the source IP address.
Uses the mapping file specified by the
SILK COUNTRY CODES environment variable, or the country codes.pmap mapping file, as described in ccfilter(3).
dcc,19
as scc for the destination IP
src-MAPNAME
value determined by passing the source IP or the protocol/source-port to the user-defined mapping
defined in the prefix map associated with MAPNAME. See the description of the --pmap-file
switch below and the pmapfilter(3) manual page.
dst-MAPNAME
as src-MAPNAME for the destination IP or protocol/destination-port.
sval
dval
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME
and dst-MAPNAME , respectively. These fields are available when a prefix map is used that is
not associated with a MAPNAME.
December 18, 2014
293
rwsort(1)
The SiLK Reference Guide
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins
written in C (also called shared object files or dynamic libraries), as described by the --python-file
and --plugin switches.
--presorted-input
Instruct rwsort to merge-sort the input files; that is, rwsort assumes the input files have been
previously sorted using the same values for the --fields and --reverse switches as was given for
this invocation. This switch can greatly reduce rwsort’s memory requirements as a large buffer is not
required for sorting the records. If the input files were created with rwsort, you can run rwfileinfo(1)
on the files to see the rwsort invocation that created them.
--reverse
Cause rwsort to reverse the sort order, causing larger values to occur in the output before smaller
values. Normally smaller values appear before larger values.
--plugin=PLUGIN
Augment the list of fields by using run-time loading of the plug-in (shared object) whose path is
PLUGIN. The switch may be repeated to load multiple plug-ins. The creation of plug-ins is described
in the silk-plugin(3) manual page. When PLUGIN does not contain a slash (/), rwsort will attempt
to find a file named PLUGIN in the directories listed in the FILES section. If rwsort finds the file, it
uses that path. If PLUGIN contains a slash or if rwsort does not find the file, rwsort relies on your
operating system’s dlopen(3) call to find the file. When the SILK PLUGIN DEBUG environment
variable is non-empty, rwsort prints status messages to the standard error as it attempts to find and
open each of its plug-ins.
--temp-directory=DIR PATH
Specify the name of the directory in which to store data files temporarily when more records have
been read that will fit into RAM. This switch overrides the directory specified in the SILK TMPDIR
environment variable, which overrides the directory specified in the TMPDIR variable, which overrides
the default, /tmp.
--sort-buffer-size=SIZE
Set the maximum size of the buffer used for sorting the records, in bytes. A larger buffer means fewer
temporary files need to be created, reducing the I/O wait times. When this switch is not specified, the
default maximum for this buffer is near 2GB. The SIZE may be given as an ordinary integer, or as a
real number followed by a suffix K, M or G, which represents the numerical value multiplied by 1,024
(kilo), 1,048,576 (mega), and 1,073,741,824 (giga), respectively. For example, 1.5K represents 1,536
bytes, or one and one-half kilobytes. (This value does not represent the absolute maximum amount
of RAM that rwsort will allocate, since additional buffers will be allocated for reading the input and
writing the output.) The sort buffer is not used when the --presorted-input switch is specified.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
294
December 18, 2014
The SiLK Reference Guide
rwsort(1)
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--print-filenames
Print to the standard error the names of input files as they are opened.
--output-path=PATH
Write the sorted SiLK Flow records to the file at PATH. This switch must not name an existing regular
file. When the standard output is not a terminal and this switch is not provided or its argument is
stdout, the sorted records are written to the standard output.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwsort searches for the site configuration file in the locations specified in the FILES section.
--input-pipe=PATH
Read the SiLK Flow records to be sorted from the named pipe at PATH. If PATH is stdin or -, records
are read from the standard input. Use of this switch is not required, since rwsort will automatically
read data from the standard input when no file names are specified on the command line. This switch
is deprecated and will be removed in the SiLK 4.0 release.
--xargs
--xargs=FILENAME
Causes rwsort to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwsort will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit. Specifying switches that add new fields or additional switches
before --help will allow the output to include descriptions of those fields or switches.
--help-fields
Print the description and alias(es) of each field and exit. Specifying switches that add new fields before
--help-fields will allow the output to include descriptions of those fields.
December 18, 2014
295
rwsort(1)
The SiLK Reference Guide
--version
Print the version number and information about how SiLK was configured, then exit the application.
--pmap-file=MAPNAME :PATH
--pmap-file=PATH
Instruct rwsort to load the mapping file located at PATH and create the src-MAPNAME and dstMAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields
specific to that prefix map. If MAPNAME is not provided, rwsort will check the prefix map file to see
if a map-name was specified when the file was created. If no map-name is available, rwsort creates the
fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a unique value
for map-name. The --pmap-file switch(es) must precede the --fields switch. For more information,
see pmapfilter(3).
--python-file=PATH
When the SiLK Python plug-in is used, rwsort reads the Python code from the file PATH to define
additional fields that can be used as part of the sort key. This file should call register field() for
each field it wishes to define. For details and examples, see the silkpython(3) and pysilk(3) manual
pages.
LIMITATIONS
When the temporary files and the final output are stored on the same file volume, rwsort will require
approximately twice as much free disk space as the size of data to be sorted.
When the temporary files and the final output are on different volumes, rwsort will require between 1 and
1.5 times as much free space on the temporary volume as the size of the data to be sorted.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line.
To sort the records in infile.rw based primarily on destination port and secondarily on source IP and write
the binary output to outfile.rw, run:
$ rwsort --fields=dport,sip --output-path=outfile.rw infile.rw
The silkpython(3) manual page provides examples that use PySiLK to create arbitrary fields to use as part
of the key for rwsort.
ENVIRONMENT
SILK TMPDIR
When set and --temp-directory is not specified, rwsort writes the temporary files it creates to this
directory. SILK TMPDIR overrides the value of TMPDIR.
TMPDIR
When set and SILK TMPDIR is not set, rwsort writes the temporary files it creates to this directory.
296
December 18, 2014
The SiLK Reference Guide
rwsort(1)
PYTHONPATH
This environment variable is used by Python to locate modules. When --python-file is specified,
rwsort loads Python which in turn loads the PySiLK module which is comprised of several files
(silk/pysilk nl.so, silk/ init .py, etc). If this silk/ directory is located outside Python’s normal search
path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK
module.
SILK PYTHON TRACEBACK
When set, Python plug-ins will output traceback information on Python errors to the standard error.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that rwsort uses
when computing the scc and dcc fields. The value may be a complete path or a file relative to the
SILK PATH. See the FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file that rwsort uses
when computing the sType and dType fields. The value may be a complete path or a file relative to
the SILK PATH. See the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwsort may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwsort may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, rwsort prints status messages to the standard error as it attempts to find and open each
of its plug-ins. In addition, when an attempt to register a field fails, the application prints a message
specifying the additional function(s) that must be defined to register the field in the application. Be
aware that the output can be rather verbose.
SILK TEMPFILE DEBUG
When set to 1, rwsort prints debugging messages to the standard error as it creates, re-opens, and
removes temporary files.
December 18, 2014
297
rwsort(1)
The SiLK Reference Guide
FILES
${SILK ADDRESS TYPES}
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file required by the sType and dType fields.
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file required by the scc and dcc fields.
${SILK PATH}/lib64/silk/
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/
Directories that rwsort checks when attempting to load a plug-in.
${SILK TMPDIR}/
${TMPDIR}/
/tmp/
Directory in which to create temporary files.
298
December 18, 2014
The SiLK Reference Guide
rwsort(1)
SEE ALSO
rwcut(1), rwfileinfo(1), rwstats(1), rwuniq(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3),
silkpython(3), silk-plugin(3), sensor.conf(5), rwflowpack(8), silk(7), yaf(1), dlopen(3), zlib(3)
NOTES
If an output path is not specified, rwsort will write to the standard output unless it is connected to a
terminal, in which case an error is printed and rwsort exits.
If an input pipe or a set of input files are not specified, rwsort will read records from the standard input
unless it is connected to a terminal, in which case an error is printed and rwsort exits.
Note that rwsort produces binary output. Use rwcut(1) to view the records.
Do not spend the resources to sort the data if you are going to be passing it to an aggregation tool like
rwtotal or rwaddrcount, which have their own internal data structures that will ignore the sorted data.
Both rwuniq(1) and rwstats(1) can take advantage of previously sorted data, but you must explicitly
inform them that the input is sorted by providing the --presorted-input switch.
December 18, 2014
299
rwsplit(1)
The SiLK Reference Guide
rwsplit
Divide a SiLK file into a (sampled) collection of subfiles
SYNOPSIS
rwsplit --basename=BASENAME
{ --ip-limit=LIMIT | --flow-limit=LIMIT
| --packet-limit=LIMIT | --byte-limit=LIMIT }
[--seed=NUMBER] [--sample-ratio=SAMPLE_RATIO]
[--file-ratio=FILE_RATIO] [--max-outputs=MAX_OUTPUTS]
[--note-add=TEXT] [--note-file-add=FILE]
[--compression-method=COMP_METHOD]
[--print-filenames] [--site-config-file=FILENAME]
[--xargs[=FILE] | FILE [FILES...]]
rwsplit --help
rwsplit --version
DESCRIPTION
rwsplit reads SiLK Flow records from the standard input or from files named on the command line and
writes the flows into a set of subfiles based on the splitting criterion. In its simplest form, rwsplit partitions
the file, meaning that each input flow will appear in one (and only one) of the subfiles.
In addition to splitting the file, rwsplit can generate files containing sample flows. Sampling is specified by
using the --sample-ratio and --file-ratio switches.
rwsplit reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as
it is read. When the --xargs switch is provided, rwsplit will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
If you wish to use the size of the output files as the splitting criterion, use the --flow-limit switch. The
paramater to this switch should be the size of the desired output files divided by the record size. The record
size can be determined by rwfileinfo(1). When the output files are compressed (see the description of
--compression-method below), you should assume about a 50% compression ratio.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
The splitting criterion is defined using one of the limit specifiers; one and only one must be specified. They
are:
300
December 18, 2014
The SiLK Reference Guide
rwsplit(1)
--ip-limit=LIMIT
Close the current subfile and begin a new subfile when the count of unique source and destination IPs
in the current subfile meets or exceeds LIMIT. The next-hop-IP does not count toward LIMIT.
--flow-limit=LIMIT
Close the current subfile and begin a new subfile when the number of SiLK Flow records in the current
subfile meets LIMIT.
--packet-limit=LIMIT
Close the current subfile and begin a new subfile when the sum of the packet counts across all SiLK
Flow records in the current subfile meets or exceeds LIMIT.
--byte-limit=LIMIT
Close the current subfile and begin a new subfile when the sum of the byte counts across all SiLK
Flow records in the current subfile meets or exceeds LIMIT. This switch does not specify the size of
the subfiles.
The other switches are:
--basename=BASENAME
Specifies the basename of the output files; this switch is required. The flows are written sequentially
to a set of subfiles whose names follow the format BASENAME.ORDER.rwf, where ORDER is an
8-digit zero-formatted sequence number (i.e., 00000000, 00000001, and so on). The sequence number
will begin at zero and increase by one for every file written, unless --file-ratio is specified,
--seed=NUMBER
Use NUMBER to seed the pseudo-random number generator for the --sample-ratio or --file-ratio
switch. This can be used to put the random number generator into a known state, which is useful for
testing.
--sample-ratio=SAMPLE RATIO
Writes one flow record, chosen at random, from every SAMPLE RATIO flows that are read.
--file-ratio=FILE RATIO
Picks one subfile, chosen from random, out of every FILE RATIO names generated, for writing to
disk.
--max-outputs=NUMBER
Limits the number of files that are written to disk to NUMBER.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
December 18, 2014
301
rwsplit(1)
The SiLK Reference Guide
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, the output files are compressed
using the default chosen when SiLK was compiled. The valid values for COMP METHOD are determined by which external libraries were found when SiLK was compiled. To see the available compression
methods and the default method, use the --help or --version switch. SiLK can support the following
COMP METHOD values when the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output. Using zlib produces the smallest output files
at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression. This
compression provides good compression with less memory and CPU overhead.
best
Use lzo1x if available, otherwise use zlib.
--print-filenames
Print to the standard error the names of input files as they are opened.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwsplit searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwsplit to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwsplit will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Assume a source file source.rwf ; to split that file into files that each contain about 100 unique IP addresses:
$ rwsplit --basename=result --ip-limit=100 source.rwf
To split source.rwf into files that each contain 100 flows:
302
December 18, 2014
The SiLK Reference Guide
rwsplit(1)
$ rwsplit --basename=result --flow-limit=100 source.rwf
The following causes rwsplit to sample 1 out of every 10 records from source.rwf ; i.e., rwsplit will read
1000 flow records to produce each subfile:
$ rwsplit --basename=result --flow-limit=100 --sample-ratio=10 source.rwf
When --file-ratio is specified, the file names are generated as usual (e.g., base-00000000, base-00000001,
...); however, one of these names will be chosen randomly from each set of --file-ratio candidates, and only
that file will be written to disk.
$ rwsplit --basename=result --flow-limit=100 --file-ratio=5 source.rwf
$ ls
result-00000002.rwf
result-00000008.rwf
result-00000013.rwf
result-00000016.rwf
LIMITATIONS
rwsplit can take exactly 1 partitioning switch per invocation.
Partitioning is not exact, rwsplit keeps appending flow records a file until it meets or exceeds the specified
LIMIT. For example, if you specify --ip-limit=100, then rwsplit will fill up the file until it has 100 IP
addresses in it; if the file has 99 addresses and a new record with 2 previously unseen addresses is received,
rwsplit will put this in the current file, resulting in a 101-address file. Similarly, if you specify --bytelimit=2000, and rwsplit receives a 10kb flow record, that flow record will be placed in the current subfile.
The switches --sample-ratio, --file-ratio, and --max-outputs are processed in that order. So, when you
specify
$ rwsplit --sample-ratio=10 --ip-limit=100
--file-ratio=10 --max-outputs=20
\
rwsplit will pick 1 out of every 10 flow records, write that to a file until it has 100 IP’s per file, pick 1 out
of every 10 files to write, and write up to 20 files. If there are 1000 records, each with 2 unique IPs in them,
then rwsplit will write at most 1 file (it will write 200 unique IP addresses, but it may not pick one of the
files from the set to write).
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
December 18, 2014
303
rwsplit(1)
The SiLK Reference Guide
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwsplit may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwsplit may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwfileinfo(1), silk(7), zlib(3)
304
December 18, 2014
The SiLK Reference Guide
rwstats(1)
rwstats
Print top-N or bottom-N lists or summarize data by protocol
SYNOPSIS
rwstats --fields=KEY [--values=VALUES]
{--count=N | --threshold=N | --percentage=N}
[{--top | --bottom}] [--presorted-input] [--no-percents]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[{--bin-time | --bin-time=SECONDS}]
[--timestamp-format=FORMAT] [--epoch-time]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--integer-sensors] [--integer-tcp-flags]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--temp-directory=DIR_PATH]
[{--legacy-timestamps | --legacy-timestamps={1,0}}]
[--site-config-file=FILENAME]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--pmap-column-width=NUM]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwstats {--overall-stats | --detail-proto-stats=PROTO[,PROTO]}
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwstats [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwstats [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwstats --legacy-help
rwstats --version
DESCRIPTION
rwstats has two modes of operation: it can compute a Top-N or Bottom-N list, or it can summarize data
for a list of protocols.
In either mode, rwstats reads SiLK Flow records from the files named on the command line or from the
standard input when no file names are specified and --xargs is not present. To read the standard input in
December 18, 2014
305
rwstats(1)
The SiLK Reference Guide
addition to the named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be
uncompressed as it is read. When the --xargs switch is provided, rwstats will read the names of the files
to process from the named text file, or from the standard input if no file name argument is provided to the
switch. The input to --xargs must contain one file name per line.
TOP-N DESCRIPTION
rwstats reads SiLK Flow records and groups them by a key composed of user-specified attributes of the
flows. For each group (or bin), a collection of aggregate values is computed; these values are typically related
to the volume of the bin, such as the sum of the bytes fields for all records that match the key. Once all the
SiLK Flow records are read, the bins are sorted by the primary aggregate value, and rwstats prints the bins
that had the largest values (giving a top-N) list or the smallest values (giving a bottom-N list). The number
of bins printed can be specified as a fixed value (e.g., print 10 bins), as a threshold (print bins whose byte
count is less than 400), or as a percentage of the total volume across all bins (print bins who that contain
at least 10% of all the packets).
The user must provide the --fields switch to select the flow attribute(s) (or field(s)) that comprise the key
for each bin. The available fields are similar to those supported by rwcut(1); see the description of the
--fields switch in the OPTIONS section below for the details. The list of fields can be extended by loading
PySiLK files (see silkpython(3)) or plug-ins (silk-plugin(3)). The fields will be printed in the order in
which they occur in the --fields switch. The size of the key is limited to 256 octets. A larger key will more
quickly use the available the memory leading to slower performance.
The aggregate value(s) to compute for each bin are also chosen by the user. As with the key fields, the
user can extend the list of aggregate fields by using PySiLK or plug-ins. The preferred way to specify the
aggregate fields is to use the --values switch; the aggregate fields will be printed in the order they occur
in the --values switch. If the user does not select any aggregate value(s), rwstats defaults to computing
the number of flow records for each bin. As with the key fields, requesting more aggregate values slows
performance.
The --presorted-input switch may allow rwstats to process data more efficiently by causing rwstats to
assume the input has been previously sorted with the rwsort(1) command. With this switch, rwstats
does not need large amounts of memory during the binning stage because it does not bin each flow; instead,
it keeps a running summation for the bin. When the key changes, the bin’s primary aggregate value is
compared with those of the current Top-N (or Bottom-N) to see if the new bin is a closer to the top (or
bottom). For the output to be meaningful, rwsort and rwstats must be invoked with the same --fields
value. When multiple input files are specified and --presorted-input is given, rwstats will merge-sort the
flow records from the input files. rwstats will usually run faster if you do not include the --presorted-input
switch when counting distinct IP addresses, even when reading sorted input. Finally, you may get unusual
results with --presorted-input when the --fields switch contains multiple time-related key fields (sTime,
duration, eTime), or when the time-related key is not the final key listed in --fields; see the NOTES section
for details.
rwstats attempts to keep all key and aggregate value data in the computer’s memory. If rwstats runs
out of memory, the current key and aggregate value data is written to a temporary file. Once all input has
been processed, the data from the temporary files is merged to produce the final output. By default, these
temporary files are stored in the /tmp directory. Because these files can be large, it is strongly recommended
that /tmp not be used as the temporary directory. To modify the temporary directory used by rwstats,
provide the --temp-directory switch, set the SILK TMPDIR environment variable, or set the TMPDIR
environment variable.
rwstats may run out of memory when computing distinct IP counts, causing the counts for some bins to be
smaller than the actual number of distinct IPs. When this occurs, a single warning is printed the standard
error noting that rwstats has run out of memory, processing continues, and rwstats exits with status 16.
306
December 18, 2014
The SiLK Reference Guide
rwstats(1)
rwstats may also run out of memory if the requested Top-N is too large.
PROTOCOL STATISTICS DESCRIPTION
Alternatively, rwstats can provide statistics for each of bytes, packets, and bytes-per-packet giving minima,
maxima, quartile, and interval flow-counts across all flows or across a list of protocols specified by the user.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
TOP-N INVOCATION
To compute a Top-N or Bottom-N list, the key field(s) must be specified. Normally the --fields switch is
used to specify the key field(s), but for backward compatibility the --fields switch is not required.
--fields=KEY
KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key into which
flows are binned. The columns will be displayed in the order the fields are specified. Each field may
be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of
field-integers; a range is specified by separating the start and end of the range with a hyphen (-).
Field-names are case insensitive. Example:
--fields=stime,10,1-5
There is no default value for the --fields switch.
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all
fields are present in all SiLK file formats; when a field is not present, its value is 0.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent. See note at iType.
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
December 18, 2014
307
rwstats(1)
The SiLK Reference Guide
flags,8
bit-wise OR of TCP flags over all packets
sTime,9
starting time of flow (seconds resolution). When the time-related fields sTime,duration,eTime
are all in use, rwstats will ignore the final time field when binning the records.
duration,10
duration of flow (seconds resolution). See note at sTime,9.
eTime,11
end time of flow (seconds resolution). See note at sTime,9.
sensor,12
name or ID of the sensor where the flow was collected
class,20
class assigned to the flow by rwflowpack(8). Binning by class and/or type equates to binning
by the integer value used internally to represent the class/type pair. When --fields contains
class but not type, rwstats’s output will have multiple rows with the same value(s) for the key
field(s).
type,21
type assigned to the flow by rwflowpack(8). See note on previous entry.
iType
the ICMP type value for ICMP or ICMPv6 flows and empty (numerically zero) for non-ICMP
flows. Internally, SiLK stores the ICMP type and code in the dPort field. To avoid getting very
odd results, either do not use the dPort field when your key includes ICMP field(s) or be certain
to include the protocol field as part of your key. This field was added in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at
iType.
icmpTypeCode,25
equivalent to iType,iCode when used in --fields. This field may not be mixed with iType or
iCode, and this field is deprecated as of SiLK 3.8.1. As of SiLK 3.8.1, icmpTypeCode may no
longer be used as the argument to the Distinct: value field; the dPort field will provide an
equivalent result as long as the input is limited to ICMP flow records.
Many SiLK file formats do not store the following fields and their values will always be 0; they are
listed here for completeness:
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
nhIP,15
router next hop IP
SiLK can store flows generated by enhanced collection software that provides more information than
NetFlow v5. These flows may support some or all of these additional fields; for flows without this
additional information, the field’s value is always 0.
308
December 18, 2014
The SiLK Reference Guide
rwstats(1)
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by the flow generator:
S
all the packets in this flow record are exactly the same size
F
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding
ACK packets)
T
flow generator prematurely created a record for a long-running connection due to a timeout.
(When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a
flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
C
flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout (or a byte threshold in the case of yaf ).
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the
active timeout since the flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each spanning some portion of
the total session. The first flow record will be marked with a T indicating that it hit the timeout.
The second through next-to-last records will be marked with TC indicating that this flow both
timed out and is a continuation of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures
to label the content of the flow. SiLK calls this label the application; yaf refers to it as the
appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external
files to provide the mapping from the IP or port to the label:
sType,16
for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2
if it is routable and external. Uses the mapping file specified by the SILK ADDRESS TYPES
environment variable, or the address types.pmap mapping file, as described in addrtype(3).
dType,17
as sType for the destination IP address
scc,18
for the source IP address, a two-letter country code abbreviation denoting the country where
that IP address is located. Uses the mapping file specified by the SILK COUNTRY CODES
environment variable, or the country codes.pmap mapping file, as described in ccfilter(3). The
abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.
org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental
reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other
December 18, 2014
309
rwstats(1)
The SiLK Reference Guide
dcc,19
as scc for the destination IP
src-MAPNAME
label determined by passing the source IP or the protocol/source-port to the user-defined mapping
defined in the prefix map associated with MAPNAME. See the description of the --pmap-file
switch below and the pmapfilter(3) manual page.
dst-MAPNAME
as src-MAPNAME for the destination IP or protocol/destination-port.
sval
dval
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME
and dst-MAPNAME , respectively. These fields are available when a prefix map is used that is
not associated with a MAPNAME.
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins
written in C (also called shared object files or dynamic libraries), as described by the --python-file
and --plugin switches.
--values=VALUES
When computing a Top-N or Bottom-N, all flows that have the same key field(s) will be binned together.
For each bin, one or more aggregate values are computed as specified by VALUES, a comma separated
list of names. Names are case insensitive. The first entry in VALUES is the primary value, and it is
used as the basis to compute the Top-N or Bottom-N. If the --values switch is not specified (and no
legacy switch that sets values is specified), rwstats counts the number of flow records for each bin.
The aggregate fields are printed in the order they occur in VALUES. The names of the built-in value
fields follow. This list can be augmented through the use of PySiLK and plug-ins.
Records
Count the number of flow records that mapped to each bin.
Packets
Sum the number of packets across all records that mapped to each bin.
Bytes
Sum the number of bytes across all records that mapped to each bin.
sIP-Distinct
Count the number of distinct source IP addresses that were seen for each bin.
dIP-Distinct
Count the number of distinct destination IP addresses that were seen for each bin.
Distinct:KEY FIELD
Count the number of distinct values for KEY FIELD, where KEY FIELD is any field that can
be used as an argument to --fields except for icmpTypeCode. For example, Distinct:sPort will
count the number of distinct source ports for each bin. When this aggregate value field is used,
the specified KEY FIELD cannot be present in the argument to --fields.
--plugin=PLUGIN
Augment the list of key fields and/or aggregate value fields by using run-time loading of the plug-in
(shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The
creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain
a slash (/), rwstats will attempt to find a file named PLUGIN in the directories listed in the FILES
310
December 18, 2014
The SiLK Reference Guide
rwstats(1)
section. If rwstats finds the file, it uses that path. If PLUGIN contains a slash or if rwstats does
not find the file, rwstats relies on your operating system’s dlopen(3) call to find the file. When the
SILK PLUGIN DEBUG environment variable is non-empty, rwstats prints status messages to the
standard error as it attempts to find and open each of its plug-ins.
--pmap-file=MAPNAME :PATH
--pmap-file=PATH
Instruct rwstats to load the mapping file located at PATH and create the src-MAPNAME and dstMAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields
specific to that prefix map. If MAPNAME is not provided, rwstats will check the prefix map file
to see if a map-name was specified when the file was created. If no map-name is available, rwstats
creates the fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a
unique value for map-name. The --pmap-file switch(es) must precede the --fields switch. For more
information, see pmapfilter(3).
--pmap-column-width=NUM
When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.
--python-file=PATH
When the SiLK Python plug-in is used, rwstats reads the Python code from the file PATH to define
additional fields that can be used as part of the key or as an aggregate value. This file should call
register field() for each field it wishes to define. For details and examples, see the silkpython(3)
and pysilk(3) manual pages.
To determine the value of N for a Top-N (or Bottom-N) list, one of the following switches must be specified.
The primary value may limit which switch may be specified.
--count=N
Print the N bins with the largest (or smallest) values. This limit is always allowed.
--threshold=N
Print the bins where the primary value is greater-than (or less-than) the value N. This limit is not
allowed when the primary value comes from a plug-in. If the threshold causes the Top-N or BottomN to become large enough that rwstats runs out of memory, rwstats will compute the Top-N or
Bottom-N using the amount of memory it was able to allocate.
--percentage=N
Print the bins where the primary value is greater-than (or less-than) N percent of the sum of the
primary values across all bins. To use this switch, the primary value must be Bytes, Packets, or
Records, and the --presorted-input switch must not be present. If the percentage causes the Top-N
or Bottom-N to become large enough that rwstats runs out of memory, rwstats will compute the
Top-N or Bottom-N using the amount of memory it was able to allocate.
To determine whether to compute the Top-N or the Bottom-N, specify one of the following switches. If
neither switch is given, --top is assumed:
--top
Print the top N keys and their values. This is the default.
--bottom
Print the bottom N keys and their values.
December 18, 2014
311
rwstats(1)
The SiLK Reference Guide
PROTOCOL STATISTICS INVOCATION
The following switches will compute and print, for each of bytes, packets, and bytes per packet, the minimum
value, the maximum value, quartiles, and a count of the number of flows that fall into each of one of ten
intervals statistics. These switches cannot be combined with the switches that produce Top-N or Bottom-N
lists.
--overall-stats
Print intervals and quartiles across all flows that were read by rwstats.
--detail-proto-stats=PROTO[,PROTO...]
Print intervals and quartiles for each individual protocol listed as an argument. The argument should
be a comma separated list of protocols or ranges of protocols: 1-6,17. Specifying this option implies
--overall-stats.
MISCELLANEOUS SWITCHES
The following switches are available when rwstats is running in either mode, though many only applicable
to the Top-N mode.
--presorted-input
Cause rwstats to assume that it is reading sorted input; i.e., that rwstats’s input file(s) were generated
by rwsort(1) using the exact same value for the --fields switch. This option allows rwstats to process
an endless stream of records. When multiple input files are specified, rwstats will merge-sort the flow
records from the input files. When using --presorted-input and computing a Top-N or Bottom-N,
the --percentage limit cannot be used. See the NOTES section for issues that may occur when using
--presorted-input.
--no-percents
For the Top-N invocation, do not print the percent-of-total and cumulative-percentage columns. These
columns will contain a question mark when the primary key is not one of Bytes, Packets, or Records,
and this switch allows you to suppress them.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains.
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
mix
Process the input as a mixture of IPv4 and IPv6 flow records. When an IP address is used as
part of the key or value, this policy is equivalent to force.
312
December 18, 2014
The SiLK Reference Guide
rwstats(1)
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.
--bin-time
--bin-time=SECONDS
Adjust the key fields ’sTime’ and ’eTime’ to appear on SECONDS -second boundaries (the floor of the
time is used). When no value is provided to the switch, 60-second time bins are used.
--timestamp-format=FORMAT
Specify how timestamps will be printed. When this switch is not specified, timestamps are printed in
the default format, and the timezone is UTC unless SiLK was compiled with local timezone support.
FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:
default
Print the timestamps as YYYY /MM /DD Thh :mm :ss .
iso
Print the timestamps as YYYY -MM -DD hh :mm :ss .
m/d/y
Print the timestamps as MM /DD /YYYY hh :mm :ss .
epoch
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK.
The timezone is one of:
utc
Use Coordinated Universal Time to print timestamps.
local
Use the TZ environment variable or the local timezone.
--epoch-time
Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch
is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed
in the SiLK 4.0 release.
--ip-format=FORMAT
Specify how IP addresses will be printed. When this switch is not specified, IPs are printed in the
canonical format. The FORMAT is one of:
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively. When the --ipv6-policy is force,
the output for 127.0.0.1 becomes 0000:0000:0000:0000:0000:ffff:7f00:0001.
December 18, 2014
313
rwstats(1)
The SiLK Reference Guide
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release.
--integer-sensors
Print the integer ID of the sensor rather than its name.
--integer-tcp-flags
Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters
F,S,R,P,A,U,E,C are used to represent the TCP flags.
--no-titles
Disable section and column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--print-filenames
Print to the standard error the names of input files as they are opened.
314
December 18, 2014
The SiLK Reference Guide
rwstats(1)
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to
the standard output as long as the --output-path switch has been used to redirect rwstats’s ASCII
output.
--output-path=PATH
Determine where the output of rwstats (ASCII text) is written. If this option is not given, output is
written to the standard output.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--temp-directory=DIR PATH
Specify the name of the directory in which to store data files temporarily when the memory is not large
enough to store all the bins and their aggregate values. This switch overrides the directory specified
in the SILK TMPDIR environment variable, which overrides the directory specified in the TMPDIR
variable, which overrides the default, /tmp.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwstats searches for the site configuration file in the locations specified in the FILES section.
--legacy-timestamps
--legacy-timestamps=NUM
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y.
Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed
in the SiLK 4.0 release.
--xargs
--xargs=FILENAME
Causes rwstats to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwstats will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit. Specifying switches that add new fields, values, or additional
switches before --help will allow the output to include descriptions of those fields or switches.
--help-fields
Print the description and alias(es) of each field and value and exit. Specifying switches that add new
fields before --help-fields will allow the output to include descriptions of those fields.
--legacy-help
Print help, including legacy switches. See the LEGACY SWITCHES section below for these switches.
--version
Print the version number and information about how SiLK was configured, then exit the application.
December 18, 2014
315
rwstats(1)
The SiLK Reference Guide
LEGACY SWITCHES
Use of the following switches has been discouraged since SiLK 2.0.0. As of SiLK 3.8.1, the switches are
deprecated and they will be removed in SiLK 4.0. For each switch, use the replacement indicated.
--sip
Use: --fields=sip
--sip=CIDR
Use the most significant CIDR bits of the source address as the key. Using this switch with IPv6 data
will cause an error. The user should use rwnetmask(1) to mask the data prior to processing it with
rwstats.
--dip
Use: --fields=dip
--dip=CIDR
Use the most significant CIDR bits of the destination address as the key. Using this switch with IPv6
data will cause an error. The user should use rwnetmask to mask the data prior to processing it with
rwstats.
--sport
Use: --fields=sport
--dport
Use: --fields=dport
--protocol
Use: --fields=protocol
--icmp
Use: --fields=iType,iCode
--flows
Use: --values=records
--packets
Use: --values=packets
--bytes
Use: --values=bytes
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Print the top talkers (based on number of flow records, limit to the top four):
316
December 18, 2014
The SiLK Reference Guide
rwstats(1)
$ rwstats --fields=sip --count=4 data.rw
INPUT: 549092 Records for 12990 Bins and 549092 Total Records
OUTPUT: Top 4 Bins by Records
sIP|
Records| %Records|
cumul_%|
10.1.1.1|
36604| 6.666278| 6.666278|
10.1.1.2|
13897| 2.530906| 9.197184|
10.1.1.3|
12739| 2.320012| 11.517196|
10.1.1.4|
11807| 2.150277| 13.667473|
Print the seven hosts that received the most packets:
$ rwstats --fields=dip --values=packets --count=7 data.rw
INPUT: 549092 Records for 44654 Bins and 6620587 Total Packets
OUTPUT: Top 7 Bins by Packets
dIP|
Packets| %Packets|
cumul_%|
10.1.1.1|
217574| 3.286325| 3.286325|
10.1.1.2|
138177| 2.087081| 5.373407|
10.1.1.3|
121892| 1.841106| 7.214512|
10.1.1.4|
97073| 1.466230| 8.680742|
10.1.1.5|
82284| 1.242851| 9.923593|
10.1.1.6|
80051| 1.209123| 11.132715|
10.1.1.7|
73602| 1.111714| 12.244430|
Print the IP pairs that shared 100,000,000 bytes or more:
$ rwstats --fields=sip,dip --values=byte --threshold=100000000 data.rw
INPUT: 549092 Records for 107136 Bins and 3410300252 Total Bytes
OUTPUT: Top 5 Bins by Bytes (threshold 100000000)
sIP|
dIP|
Bytes|
%Bytes|
cumul_%|
10.1.1.1|
10.1.1.2|
307478707| 9.016177| 9.016177|
10.1.1.3|
10.1.1.4|
172164463| 5.048367| 14.064544|
10.1.1.5|
10.1.1.6|
142059589| 4.165604| 18.230147|
10.1.1.7|
10.1.1.8|
119388394| 3.500818| 21.730965|
10.1.1.9|
10.1.1.10|
108268824| 3.174759| 24.905725|
Print the ports that were the source of at least 5% of all records:
$ rwstats --fields=sport --percentage=5 data.rw
INPUT: 549092 Records for 56799 Bins and 549092 Total Records
OUTPUT: Top 3 Bins by Records (5% == 27454)
sPort|
Records| %Records|
cumul_%|
80|
86677| 15.785515| 15.785515|
53|
64681| 11.779629| 27.565144|
0|
47760| 8.697996| 36.263140|
Print the destination ports that saw the least number of records (limit to the bottom eight):
$ rwstats --fields=dport --bottom --count=8 data.rw
INPUT: 549092 Records for 44772 Bins and 549092 Total Records
OUTPUT: Bottom 8 Bins by Records
December 18, 2014
317
rwstats(1)
dPort|
19417|
12110|
34777|
8999|
36404|
16682|
27420|
14162|
The SiLK Reference Guide
Records|
1|
1|
1|
1|
1|
1|
1|
1|
%Records|
0.000182|
0.000182|
0.000182|
0.000182|
0.000182|
0.000182|
0.000182|
0.000182|
cumul_%|
0.000182|
0.000364|
0.000546|
0.000728|
0.000911|
0.001093|
0.001275|
0.001457|
Print the source-destination port pairs that shared more than 500,000 packets (there were none):
$ rwstats --fields=sport,dport --values=packets \
--top --threshold=500000 data.rw
INPUT: 366309 Records for 130307 Bins and 5597540 Total Packets
OUTPUT: No bins above threshold of 500000
Print the source-destination port pairs that shared more than 50,000 packets:
$ rwstats --fields=sport,dport --values=packets \
--top --threshold=50000 data.rw
INPUT: 366309 Records for 130307 Bins and 5597540 Total Packets
OUTPUT: Top 3 Bins by Packets (threshold 50000)
sPort|dPort|
Packets| %Packets|
cumul_%|
6699| 3607|
138177| 2.468531| 2.468531|
80| 1179|
59774| 1.067862| 3.536393|
80| 9659|
50319| 0.898949| 4.435342|
Print the protocols from least to most active (based on number of records):
$ rwstats --fields=protocol --bottom --count=10 data.rw
INPUT: 545262 Records for 3 Bins and 545262 Total Records
OUTPUT: Bottom 10 Bins by Records
protocol|
Records| %Records|
cumul_%|
1|
46319| 8.494815| 8.494815|
17|
132634| 24.324820| 32.819635|
6|
366309| 67.180365|100.000000|
Print the packet and byte counts for the pair of /16s that shared the most packets (use rwnetmask(1) on
the input to rwstats; limit result to top ten):
$ rwstats --fields=sip,dip --values=packets,bytes \
--count=10 --no-percent
INPUT: 250928 Records for 230 Bins and 72279154 Total Packets
OUTPUT: Top 10 Bins by Packets
sIP|
dIP| Packets|
Bytes|
10.255.0.0|
192.168.0.0| 2711524|
2207297227|
10.253.0.0|
192.168.0.0| 2690120|
2288595669|
10.254.0.0|
192.168.0.0| 2593074|
2141263178|
10.252.0.0|
192.168.0.0| 2553388|
2117294828|
318
December 18, 2014
The SiLK Reference Guide
10.250.0.0|
10.251.0.0|
10.249.0.0|
10.248.0.0|
10.247.0.0|
10.246.0.0|
192.168.0.0|
192.168.0.0|
192.168.0.0|
192.168.0.0|
192.168.0.0|
192.168.0.0|
rwstats(1)
2312661|
2218194|
2196041|
2160037|
2000379|
1878143|
1982654956|
1785263601|
1934938137|
1804446929|
1579214987|
1578321728|
Print the interval breakdowns for flow records, packets, and bytes across all protocols, and for protocols 6
(TCP) and 17 (UDP):
$ rwstats --detail-proto-stats=6,17 data.rw
FLOW STATISTICS--ALL PROTOCOLS: 549092 records
*BYTES min 28; max 88906238
quartiles LQ 122.06478 Med 420.30930 UQ 876.21920 UQ-LQ 754.15442
interval_max|count<=max|%_of_input|
cumul_%|
40|
35107| 6.393646| 6.393646|
60|
35008| 6.375616| 12.769263|
100|
49500| 9.014883| 21.784145|
150|
40014| 7.287303| 29.071449|
256|
65444| 11.918586| 40.990034|
1000|
224016| 40.797535| 81.787569|
10000|
75708| 13.787853| 95.575423|
100000|
21981| 4.003154| 99.578577|
1000000|
1901| 0.346208| 99.924785|
4294967295|
413| 0.075215|100.000000|
*PACKETS min 1; max 70023
quartiles LQ 1.76962 Med 3.68119 UQ 7.61567 UQ-LQ 5.84605
interval_max|count<=max|%_of_input|
cumul_%|
3|
232716| 42.381969| 42.381969|
4|
61407| 11.183372| 53.565341|
10|
195310| 35.569631| 89.134972|
20|
33310| 6.066379| 95.201351|
50|
17686| 3.220954| 98.422304|
100|
4854| 0.884005| 99.306309|
500|
2760| 0.502648| 99.808957|
1000|
373| 0.067930| 99.876888|
10000|
637| 0.116010| 99.992897|
4294967295|
39| 0.007103|100.000000|
*BYTES/PACKET min 28; max 1500
quartiles LQ 57.98319 Med 90.71150 UQ 164.77250 UQ-LQ 106.78932
interval_max|count<=max|%_of_input|
cumul_%|
40|
42568| 7.752435| 7.752435|
44|
15173| 2.763289| 10.515724|
60|
91003| 16.573361| 27.089085|
100|
163850| 29.840173| 56.929258|
200|
153190| 27.898786| 84.828043|
400|
39761| 7.241227| 92.069271|
600|
12810| 2.332942| 94.402213|
800|
7954| 1.448573| 95.850786|
1500|
22783| 4.149214|100.000000|
4294967295|
0| 0.000000|100.000000|
December 18, 2014
319
rwstats(1)
The SiLK Reference Guide
FLOW STATISTICS--PROTOCOL 6: 366309/549092 records
*BYTES min 40; max 88906238
quartiles LQ 310.47331 Med 656.53661 UQ 1089.75344 UQ-LQ 779.28013
interval_max|count<=max|%_of_proto|
cumul_%|
40|
29774| 8.128110| 8.128110|
60|
11453| 3.126595| 11.254706|
100|
6915| 1.887751| 13.142456|
150|
16369| 4.468632| 17.611088|
256|
12651| 3.453642| 21.064730|
1000|
196881| 53.747246| 74.811976|
10000|
68989| 18.833553| 93.645529|
100000|
21099| 5.759891| 99.405420|
1000000|
1784| 0.487021| 99.892441|
4294967295|
394| 0.107559|100.000000|
*PACKETS min 1; max 70023
quartiles LQ 3.39682 Med 5.85903 UQ 8.80427 UQ-LQ 5.40745
interval_max|count<=max|%_of_proto|
cumul_%|
3|
69358| 18.934288| 18.934288|
4|
55993| 15.285729| 34.220016|
10|
186559| 50.929407| 85.149423|
20|
30947| 8.448332| 93.597755|
50|
16186| 4.418674| 98.016429|
100|
4204| 1.147665| 99.164094|
500|
2178| 0.594580| 99.758674|
1000|
315| 0.085993| 99.844667|
10000|
537| 0.146598| 99.991264|
4294967295|
32| 0.008736|100.000000|
*BYTES/PACKET min 40; max 1500
quartiles LQ 60.19817 Med 96.78616 UQ 175.08044 UQ-LQ 114.88228
interval_max|count<=max|%_of_proto|
cumul_%|
40|
36559| 9.980372| 9.980372|
44|
14929| 4.075521| 14.055893|
60|
39593| 10.808634| 24.864527|
100|
100117| 27.331297| 52.195824|
200|
111258| 30.372718| 82.568542|
400|
26020| 7.103293| 89.671834|
600|
8600| 2.347745| 92.019579|
800|
7726| 2.109148| 94.128727|
1500|
21507| 5.871273|100.000000|
4294967295|
0| 0.000000|100.000000|
FLOW STATISTICS--PROTOCOL 17: 132634/549092 records
*BYTES min 32; max 2115559
quartiles LQ 66.53665 Med 150.61551 UQ 242.44095 UQ-LQ 175.90430
interval_max|count<=max|%_of_proto|
cumul_%|
20|
0| 0.000000| 0.000000|
40|
5195| 3.916794| 3.916794|
80|
42150| 31.779182| 35.695975|
130|
11528| 8.691587| 44.387563|
256|
45497| 34.302667| 78.690230|
1000|
23401| 17.643289| 96.333519|
10000|
4447| 3.352836| 99.686355|
320
December 18, 2014
The SiLK Reference Guide
rwstats(1)
100000|
389| 0.293288| 99.979643|
1000000|
23| 0.017341| 99.996984|
4294967295|
4| 0.003016|100.000000|
*PACKETS min 1; max 8839
quartiles LQ 0.84383 Med 1.68768 UQ 2.53149 UQ-LQ 1.68766
interval_max|count<=max|%_of_proto|
cumul_%|
3|
117884| 88.879171| 88.879171|
4|
4452| 3.356605| 92.235777|
10|
6678| 5.034908| 97.270685|
20|
1766| 1.331484| 98.602168|
50|
1055| 0.795422| 99.397590|
100|
368| 0.277455| 99.675046|
500|
353| 0.266146| 99.941192|
1000|
33| 0.024880| 99.966072|
10000|
45| 0.033928|100.000000|
4294967295|
0| 0.000000|100.000000|
*BYTES/PACKET min 32; max 1415
quartiles LQ 63.23827 Med 91.27180 UQ 158.10219 UQ-LQ 94.86392
interval_max|count<=max|%_of_proto|
cumul_%|
20|
0| 0.000000| 0.000000|
24|
0| 0.000000| 0.000000|
40|
5671| 4.275676| 4.275676|
100|
70970| 53.508150| 57.783826|
200|
39298| 29.628904| 87.412730|
400|
12175| 9.179396| 96.592126|
600|
4130| 3.113832| 99.705958|
800|
160| 0.120633| 99.826590|
1500|
230| 0.173410|100.000000|
4294967295|
0| 0.000000|100.000000|
The silkpython(3) manual page provides examples that use PySiLK to create arbitrary fields to use as part
of the key for rwstats.
ENVIRONMENT
SILK IPV6 POLICY
This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
SILK PAGER
When set to a non-empty string, rwstats automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwstats does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwstats automatically invokes this program to display its
output a screen at a time.
SILK TMPDIR
When set and --temp-directory is not specified, rwstats writes the temporary files it creates to this
directory. SILK TMPDIR overrides the value of TMPDIR.
TMPDIR
When set and SILK TMPDIR is not set, rwstats writes the temporary files it creates to this directory.
December 18, 2014
321
rwstats(1)
The SiLK Reference Guide
PYTHONPATH
This environment variable is used by Python to locate modules. When --python-file is specified,
rwstats loads Python which in turn loads the PySiLK module which is comprised of several files
(silk/pysilk nl.so, silk/ init .py, etc). If this silk/ directory is located outside Python’s normal search
path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK
module.
SILK PYTHON TRACEBACK
When set, Python plug-ins will output traceback information on Python errors to the standard error.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that rwstats uses
when computing the scc and dcc fields. The value may be a complete path or a file relative to the
SILK PATH. See the FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file that rwstats uses
when computing the sType and dType fields. The value may be a complete path or a file relative to
the SILK PATH. See the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwstats may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwstats may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check the
Timezone support value in the output from rwstats --version), the value of the TZ environment
variable determines the timezone in which rwstats displays timestamps. If the TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to
be displayed in UTC. The value of the TZ environment variable is ignored when the SiLK installation
uses utc. For system information on the TZ variable, see tzset(3).
SILK PLUGIN DEBUG
When set to 1, rwstats prints status messages to the standard error as it attempts to find and open
each of its plug-ins. In addition, when an attempt to register a field fails, rwstats prints a message
specifying the additional function(s) that must be defined to register the field in rwstats. Be aware
that the output can be rather verbose.
322
December 18, 2014
The SiLK Reference Guide
rwstats(1)
SILK TEMPFILE DEBUG
When set to 1, rwstats prints debugging messages to the standard error as it creates, re-opens, and
removes temporary files.
SILK UNIQUE DEBUG
When set to 1, the binning engine used by rwstats prints debugging messages to the standard error.
FILES
${SILK ADDRESS TYPES}
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file required by the sType and dType fields.
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file required by the scc and dcc fields.
${SILK PATH}/lib64/silk/
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
December 18, 2014
323
rwstats(1)
The SiLK Reference Guide
/usr/local/lib/silk/
/usr/local/lib/
Directories that rwstats checks when attempting to load a plug-in.
${SILK TMPDIR}/
${TMPDIR}/
/tmp/
Directory in which to create temporary files.
NOTES
rwstats functionally replaces the combination the following, where N is one more than the number of fields
passed to rwuniq(1):
rwuniq --fields=... | sort -r -t ’|’ -k N | head -10
When the --bin-time switch is given and the three time fields (starting-time (sTime), ending-time (eTime),
and duration (duration)) are present in the key, the duration field’s value will be modified to be the difference
between the ending and starting times.
When the three time-related key fields (sTime,duration,eTime) are all in use, rwstats will ignore the final
time field when binning the records, but the field will appear in the output. Due to truncation of the
milliseconds values, rwstats will generate different numbers of bins depending on the order in which those
three values appear in the --fields switch.
When computing distinct counts over a field, the field may not be part of the key; that is, you cannot have
--fields=sip --values=sip-distinct.
Using the --presorted-input switch sometimes introduces more issues than it solves, and --presortedinput is less necessary now that rwstats can use temporary files while processing input.
When using the --presorted-input switch, it is highly recommended that you use no more than one timerelated key field (sTime, duration, eTime) in the --fields switch and that the time-related key appear last
in --fields. The issue is caused by rwsort considering the millisecond values on the times when sorting,
while rwstats truncates the millisecond value.
When computing distinct IP counts, rwstats will typically run faster if you do not use the --presortedinput switch, even if the data was previously sorted.
rwstats may run out of memory when computing distinct IP counts, causing the counts for some bins to be
smaller than the actual number of distinct IPs. When this occurs, a single warning is printed the standard
error noting that rwstats has run out of memory, processing continues, and rwstats exits with status 16.
rwstats’s strength is its ability to build arbitrary keys and aggregate fields. For maps of a single key to a
single value, see also rwbag(1).
SEE ALSO
rwcut(1), rwnetmask(1), rwsort(1), rwuniq(1), rwbag(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk-plugin(3), sensor.conf(5), rwflowpack(8), silk(7), yaf(1),
dlopen(3)
324
December 18, 2014
The SiLK Reference Guide
rwswapbytes(1)
rwswapbytes
Convert the byte order of a SiLK Flow file
SYNOPSIS
rwswapbytes
{--big-endian|--little-endian|--native-endian|--swap-endian}
[--note-add=TEXT] [--note-file-add=FILE]
INPUT_FILE OUTPUT_FILE
rwswapbytes --help
rwswapbytes --version
DESCRIPTION
Change the byte order of INPUT FILE as specified by the option and write the result to OUTPUT FILE.
rwswapbytes will read the input from the standard input if you use the string stdin for INPUT FILE ;
swapbytes will write the output to the standard output if you use the string stdout for OUTPUT FILE.
rwswapbytes knows how to read and write compressed (gzip(1)ped) files.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
One of these switches must be provided:
--big-endian
Write the output-file in big-endian (network byte-order) format.
--little-endian
Write the output-file in little-endian format.
--native-endian
Write the output-file in this machine’s native format.
--swap-endian
Unconditionally swap the byte-order of the input-file.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
These switches are optional:
December 18, 2014
325
rwswapbytes(1)
The SiLK Reference Guide
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
SEE ALSO
rwfileinfo(1), silk(7), gzip(1)
326
December 18, 2014
The SiLK Reference Guide
rwtotal(1)
rwtotal
Count how much traffic matched specific keys
SYNOPSIS
rwtotal {--sip-first-8 | --sip-first-16 | --sip-first-24 |
--sip-last-8 | --sip-last-16 | --dip-first-8 |
--dip-first-16 | --dip-first-24 | --dip-last-8 |
--dip-last-16 | --sport | --dport | --proto | --packets |
--bytes | --duration | --icmp-code}
[--summation] [--min-bytes=COUNT] [--max-bytes=COUNT]
[--min-packets=COUNT] [--max-packets=COUNT]
[--min-records=COUNT] [--max-records=COUNT] [--skip-zeroes]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--site-config-file=FILENAME]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwtotal --help
rwtotal --version
DESCRIPTION
rwtotal reads SiLK Flow records, bins those records by the user-specified specified key, computes the volume
per bin (record count and sums of packets and bytes), and prints the bins and their volumes.
rwtotal reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as
it is read. When the --xargs switch is provided, rwtotal will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
By default, rwtotal prints a bin for every possible key, even when the volume for that bin is zero. Use the
--skip-zeroes switch to suppress the printing of these empty bins.
Use the --summation switch to include a row giving the volume for all flow records.
The maximum key value that rwtotal supports is 16,777,215. When the key field is --bytes or --packets,
rwtotal will create a bin for all unique values up to 16,777,214. The final bin (16,777,215) will consist of all
values greater than 16,777,214.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
One and only one of the following counting keys is required:
December 18, 2014
327
rwtotal(1)
The SiLK Reference Guide
--sip-first-8
Key on the first 8 bits of the source IP address
--sip-first-16
Key on the first 16 bits of the source IP address
--sip-first-24
Key on the first 24 bits of the source IP address
--sip-last-8
Key on the last 8 bits of the source IP address
--sip-last-16
Key on the last 16 bits of the source IP address
--dip-first-8
Key on the first 8 bits of the destination IP address
--dip-first-16
Key on the first 16 bits of the destination IP address
--dip-first-24
Key on the first 24 bits of the destination IP address
--dip-last-8
Key on the last 8 bits of the destination IP address
--dip-last-16
Key on the last 16 bits of the destination IP address
--sport
Key on the source port.
--dport
Key on the destination port.
--proto
Key on the protocol.
--packets
Key on the number of packets in the record
--bytes
Key on the number of bytes in the record
--duration
Key on the duration of the record.
--icmp-code
Key on the ICMP type and code. This switch will assume that all incoming records are ICMP.
The following options affect the output:
328
December 18, 2014
The SiLK Reference Guide
rwtotal(1)
--summation
Print as the final row a total of the values in each column.
--min-bytes=COUNT
Disable printing of bins with fewer than COUNT bytes. By default, all bins are printed.
--max-bytes=COUNT
Disable printing of bins with more than COUNT bytes. By default, all bins are printed.
--min-packets=COUNT
Disable printing of bins with fewer than COUNT packets. By default, all bins are printed.
--max-packets=COUNT
Disable printing of bins with more than COUNT packets. By default, all bins are printed.
--min-records=COUNT
Disable printing of bins with fewer than COUNT flow records. By default, all bins are printed.
--max-records=COUNT
Disable printing of bins with more than COUNT flow records. By default, all bins are printed.
--skip-zeroes
Disable printing of bins with no traffic. By default, all bins are printed.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--print-filenames
Print to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to
the standard output as long as the --output-path switch has been used to redirect rwtotal’s ASCII
output.
December 18, 2014
329
rwtotal(1)
The SiLK Reference Guide
--output-path=PATH
Determine where the output of rwtotal (ASCII text) is written. If this option is not given, output is
written to the standard output.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwtotal searches for the site configuration file in the locations specified in the FILES section.
--xargs
--xargs=FILENAME
Causes rwtotal to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwtotal will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Group by the protocol
Group all incoming data for the first hour of March 1, 2003 by protocol.
$ rwfilter --start-date=2003/03/01:00 --end-date=2003/03/01:00 \
--all-destination=stdout
\
| rwtotal --proto --skip-zero
protocol|
Records|
Bytes|
Packets|
1|
15622|
10695328|
147084|
6|
330726|
120536195111|
144254362|
17|
155528|
24500079|
155528|
To get the same result with rwuniq(1), use:
$ rwfilter ... --pass=stdout
| rwuniq --fields=proto --values=records,bytes,packets
--sort-output
330
\
\
December 18, 2014
The SiLK Reference Guide
pro|
1|
6|
17|
Records|
15622|
330726|
155528|
rwtotal(1)
Bytes|
10695328|
120536195111|
24500079|
Packets|
147084|
144254362|
155528|
Group by the source Class A addresses
$ rwfilter --start-date=2003/03/01:00 --end-date=2003/03/01:00 \
--all-destination=stdout
\
| rwtotal --sip-first-8 --skip-zero
sIP_First8|
Records|
Bytes|
Packets|
10|
173164|
59950837766|
72201390|
172|
77764|
17553593|
77764|
192|
250948|
60602999159|
72277820|
Use rwnetmask(1) and rwuniq(1) to get a similar result:
$ rwfilter ... --pass=stdout
\
| rwnetmask --4sip-prefix=8
\
| rwuniq --fields=sip --values=records,bytes,packets
\
--sort-output --ipv6-policy=ignore
sIP|
Records|
Bytes|
Packets|
10.0.0.0|
173164|
59950837766|
72201390|
172.0.0.0|
77764|
17553593|
77764|
192.0.0.0|
250948|
60602999159|
72277820|
Group by the final IPv4 octet
$ rwfilter --start-date=2003/03/01:00 --end-date=2003/03/01:00
\
--proto=6 --pass=stdout --daddress=192.168.x.x
\
| rwtotal --dip-last-16 --skip-zero | head -5
dIP_Last16|
Records|
Bytes|
Packets|
0. 38|
6|
4862678|
4016|
1. 14|
1|
32844|
452|
18.146|
1|
4226|
12|
21. 4|
6|
5462032|
4521|
One way to accomplish this with rwuniq is to create a new field using PySiLK (see pysilk(3)) and the
PySiLK plug-in capability (see silkpython(3). The invocation is:
$ rwfilter ... --pass=stdout
| rwuniq --python=/tmp/dip16.py --fields=dip-last-16
--values=flows,bytes,packets --sort-output | head -5
dip-last-16|
Records|
Bytes|
Packets|
0.0.0.38|
6|
4862678|
4016|
0.0.1.14|
1|
32844|
452|
0.0.18.146|
1|
4226|
12|
0.0.21.4|
6|
5462032|
4521|
\
\
where the definition of the dip-last-16 field is given in the file tmp/dip16.py:
December 18, 2014
331
rwtotal(1)
The SiLK Reference Guide
import silk
mask = silk.IPAddr("0.0.255.255")
def mask_dip(r):
return r.dip.mask(mask)
register_ipv4_field("dip-last-16", mask_dip)
ENVIRONMENT
SILK PAGER
When set to a non-empty string, rwtotal automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwtotal does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwtotal automatically invokes this program to display its
output a screen at a time.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwtotal may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwtotal may use this environment variable. See the FILES section for details.
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
332
December 18, 2014
The SiLK Reference Guide
rwtotal(1)
SEE ALSO
rwaddrcount(1), rwnetmask(1), rwstats(1), rwuniq(1), pysilk(3), silkpython(3), silk(7)
BUGS
rwtotal replicates some functionality in rwuniq(1) (most notably when rwuniq checks by port or protocol),
but the implementations differ: rwtotal uses an array instead of a hash-table, so access is faster, the output
is always sorted, and the output includes keys with a value of zero. The use of an array prevents rwtotal
from using the complete IP address the way rwuniq does, but it also ensures that rwtotal will not run out
of memory.
When used in an IPv6 environment, rwtotal will process every record as long as the IP address is not part
of the key. When aggregating by the IP address, rwtotal converts IPv6 flow records that contain addresses
in the ::ffff:0:0/96 prefix to IPv4 and processes them. IPv6 records having addresses outside of that prefix
are silently ignored. rwtotal will not be modified to support IPv6 addresses; instead, users should use
rwuniq(1) (maybe combined with rwnetmask(1)).
rwtotal is also similar to rwaddrcount(1) and rwstats(1).
December 18, 2014
333
rwtuc(1)
The SiLK Reference Guide
rwtuc
Text Utility Converter - rwcut output to SiLK flows
SYNOPSIS
rwtuc [--fields=FIELDS] [--column-separator=CHAR]
[--output-path=FILEPATH] [--bad-input-lines=FILEPATH]
[--verbose] [--stop-on-error] [--no-titles] [--note-add=TEXT]
[--note-file-add=FILE] [--compression-method=COMP_METHOD]
[--site-config-file=FILENAME] [--saddress=IPADDR]
[--daddress=IPADDR] [--sport=NUM] [--dport=NUM]
[--protocol=NUM] [--packets=NUM] [--bytes=NUM]
[--flags-all=TCPFLAGS] [--stime=TIME] [--duration=NUM]
[--etime=TIME] [--sensor=SID] [--input-index=NUM]
[--output-index=NUM] [--next-hop-ip=IPADDR]
[--flags-initial=TCPFLAGS] [--flags-session=TCPFLAGS]
[--attributes=ATTR] [--application=NUM] [--class=NAME]
[--type=NAME] [--stime+msec=TIME] [--etime+msec=TIME]
[--duration+msec=NUM] [--icmp-type=NUM] [--icmp-code=NUM]
[FILES]
rwtuc --help
rwtuc --version
DESCRIPTION
rwtuc reads text files that have a format similar to that produced by rwcut(1) and attempts to create a
SiLK Flow record for each line of input.
The fields which make up a single record should be separated by the pipe character (’|’); use the --columnseparator switch to change this delimiter. Note that the space character will not work as delimiter since
several fields (e.g., time, TCP-flags) may contain embedded spaces.
The fields to be read from each line can be specified with the --fields switch; if the switch is not provided,
rwtuc treats the first line as a title and attempts to determine the fields from the title strings.
When --fields is specified, rwtuc still checks whether the first line contains title strings, and rwtuc skips
the line if it determines it does. Specify the --no-titles switch to force rwtuc to treat the first line as record
values to be parsed.
Command line switches exist which force a field to have a fixed value. These switches cause rwtuc to
override the value read from the input file (if any) for those fields. See the Fixed Values section below for
details.
The textual input is read from the files named on the command line; if no files are specified, rwtuc attempts
to read the text from the standard input if it is not connected to a terminal. To force rwtuc to read input
from the terminal, specify stdin or - as the input stream.
When the --output-path switch is not provided, output is sent to the standard output when it is not
connected to a terminal.
334
December 18, 2014
The SiLK Reference Guide
rwtuc(1)
By default, lines that cannot be parsed are silently ignored (unless rwtuc is attempting to determine the
fields from the title line). When the --verbose switch is specified, problems parsing an input line will be
reported to the standard error, and rwtuc will continue to process the input. The --stop-on-error switch
is similar to the --verbose switch, except processing stops after the first error. Input lines that cause parse
errors can be copied to another output stream with the --bad-input-lines switch. Each bad line will have
the source file name and line number prepended to it, separated from each other and the source line by
colons (’:’).
Field Constraints
Due to the way SiLK Flow records are stored, certain field combinations cannot be supported, certain fields
must appear together, and some fields may only be used on certain occasions:
• Only two of the three time-related values (start time, duration, end time) may be specified. When all
three are specified, the end time is ignored. This affects the sTime,9, duration,10, and eTime,11
fields and the --stime, --duration, and --etime switches.
• Both ICMP type and ICMP code must be present when one is present. These may be set by a
combination of the iType and iCode fields and the --icmp-type and --icmp-code switches. These
values are ignored unless either the protocol is ICMP (1) or the record contains IPv6 addresses and the
protocol is ICMPv6 (58). The ICMP type and code are encoded in the destination port field (dPort,4
or --dport), and they overwrite the port value for ICMP and ICMPv6 flow records.
• Both initial TCP flags and session TCP flags must be present when one is present. These may be set
by a combination of the initialFlags,26 and sessionFlags,27 fields and the --flags-initial and
--flags-session switches. These fields are set to 0 for non-TCP flow records. When either field has a
non-zero value, any value in the (ALL) TCP flags field (flags,8 or --flags-all) is overwritten for TCP
flow records.
• If the silk.conf(5) file defines more than one class, both class and type must be present for the values
to have any affect on the SiLK flow record. These may be set by a combination of the class and
type fields and the --class and --type switches. If silk.conf defines a single class, that class is used
by default. The class and type must map to a valid pair; use rwsiteinfo --fields=class,type to see
the list of valid class/type pairs for your site.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
--fields=FIELDS
FIELDS contains the list of fields (columns) to parse. FIELDS is a comma separated list of fieldnames, field-integers, and ranges of field-integers; a range is specified by separating the start and end
of the range with a hyphen (-). Field-names are case insensitive.
A field is ignored when the fixed value switch that corresponds to that field is given on the command
line (see Fixed Values).
The field names and their descriptions are:
ignore
a field that rwtuc is to skip
December 18, 2014
335
rwtuc(1)
The SiLK Reference Guide
sIP,1
source IP address in the canonical form: dotted-quad for IPv4 or hex-encoded for IPv6 (when
SiLK has been compiled with IPv6 support). Integers from 0 to 4294967295 will be treated as an
IPv4 address.
dIP,2
destination IP address in the same format as sIP,1
sPort,3
source port as an integer from 0 to 65535 inclusive
dPort,4
destination port as an integer from 0 to 65535 inclusive (cf. Field Constraints)
protocol,5
IP protocol as an integer from 0 to 255 inclusive
packets,pkts,6
packet count as an integer from 1 to 4294967295 inclusive
bytes,7
byte count as an integer from 1 to 4294967295 inclusive
flags,8
bitwise OR of TCP flags over all packets; the string may contain F, S, R, P, A, U, E, C in upper- or
lowercase (cf. Field Constraints)
sTime,9
starting time of the flow, in the form YYYY/MM/DD[:hh[:mm[:ss[.sss]]]]. A T may be used in
place of : to separate the day and hour fields. A floating point value between 536870912 and
4294967295 is also allowed and will be treated as seconds since the UNIX epoch.
duration,10
duration of flow as a floating point value from 0.0 to 4294967.295
eTime,11
end time of flow in the same form as sTime,9 (cf. Field Constraints)
sensor,12
router sensor name or ID as given in silk.conf
class
class of router at collection point as given in silk.conf (cf. Field Constraints)
type
type of router at collection point as given in silk.conf (cf. Field Constraints)
Many of our packed files do not store the following fields and their values will always be 0, but they
are listed here for completeness:
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5)); an integer from 0 to 65535
out,14
router SNMP output interface or postVlanId; an integer from 0 to 65535
nhIP,15
router next hop IP address in the same format as sIP,1
336
December 18, 2014
The SiLK Reference Guide
rwtuc(1)
SiLK can store flows generated by enhanced collection software that provides more information than
NetFlow v5. These flows may support some or all of these additional fields; for flows without this
additional information, the field’s value is always 0.
initialFlags,26
TCP flags on first packet in the flow; same form as the flags,8 field (cf. Field Constraints)
sessionFlags,27
bitwise OR of TCP flags over all packets except the first in the flow; same form as the flags,8
field (cf. Field Constraints)
attribute,28
flow attributes set by the flow generator:
S
all the packets in this flow record are exactly the same size
F
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding
ACK packets)
T
flow generator prematurely created a record for a long-running connection due to a timeout.
(When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a
flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
C
flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout (or a byte threshold in the case of yaf ).
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the
active timeout since the flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each spanning some portion of
the total session. The first flow record will be marked with a T indicating that it hit the timeout.
The second through next-to-last records will be marked with TC indicating that this flow both
timed out and is a continuation of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
application,29
guess as to the content of the flow, as an integer from 0 to 65535. Some software that generates
flow records from packet data, such as yaf, will inspect the contents of the packets that make up a
flow and use traffic signatures to label the content of the flow. SiLK calls this label the application;
yaf refers to it as the appLabel. The application is the port number that is traditionally used for
that type of traffic (see the /etc/services file on most UNIX systems). For example, traffic that
the flow generator recognizes as FTP will have a value of 21, even if that traffic is being routed
through the standard HTTP/web port (80).
iType
ICMP type as an integer from 0 to 255 inclusive (cf. Field Constraints)
iCode
ICMP code as an integer from 0 to 255 inclusive (cf. Field Constraints)
Fields may not be specified more than once.
--column-separator=CHAR
Expect the character CHAR to used as the delimiter between columns instead of the default ’|’.
December 18, 2014
337
rwtuc(1)
The SiLK Reference Guide
--output-path=FILEPATH
Write the SiLK Flow records to FILEPATH. The strings stdout and stderr may be used for the
standard output and standard error, respectively, as long as they are not connected to a terminal.
--bad-input-lines=FILEPATH
Copy any lines which could not be parsed to FILEPATH. The strings stdout and stderr may be
used for the standard output and standard error, respectively. Each bad line will be prepended by the
source input file, a colon, the line number, and a colon. On exit, rwtuc will remove FILEPATH if all
input lines were successfully parsed.
--verbose
If an input line fails to parse, print a message describing the error to the standard error, and continue
to process the input.
--stop-on-error
If an input line fails to parse, print a message describing the error to the standard error and exit. The
output file will contain any records successfully created prior to reading the bad input line.
--no-titles
Treat the first line of input as record values to be parsed. When this switch is not provided, rwtuc
skips the first line of input if it determines that the line contains the names of fields (titles). rwtuc
exits with an error when --no-titles is given but --fields is not.
--note-add=TEXT
Add the specified TEXT to the header of the output file as an annotation. This switch may be repeated
to add multiple annotations to a file. To view the annotations, use the rwfileinfo(1) tool.
--note-file-add=FILENAME
Open FILENAME and add the contents of that file to the header of the output file as an annotation.
This switch may be repeated to add multiple annotations. Currently the application makes no effort
to ensure that FILENAME contains text; be careful that you do not attempt to add a SiLK data file
as an annotation.
--compression-method=COMP METHOD
Specify how to compress the output. When this switch is not given, output to the standard output or to
named pipes is not compressed, and output to files is compressed using the default chosen when SiLK
was compiled. The valid values for COMP METHOD are determined by which external libraries were
found when SiLK was compiled. To see the available compression methods and the default method,
use the --help or --version switch. SiLK can support the following COMP METHOD values when
the required libraries are available.
none
Do not compress the output using an external library.
zlib
Use the zlib(3) library for compressing the output, and always compress the output regardless
of the destination. Using zlib produces the smallest output files at the cost of speed.
lzo1x
Use the lzo1x algorithm from the LZO real time compression library for compression, and always
compress the output regardless of the destination. This compression provides good compression
with less memory and CPU overhead.
338
December 18, 2014
The SiLK Reference Guide
rwtuc(1)
best
Use lzo1x if available, otherwise use zlib. Only compress the output when writing to a file.
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwtuc searches for the site configuration file in the locations specified in the FILES section.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
Fixed Values
The following switches can be used to set fields to fixed values. A value specified using one these switches
overrides the field when it appears in the input, causing that column of input to be completely ignored.
--saddress=IPADDR
Set the source address field to IPADDR for all records. IPADDR can be in canonical notation or an
unsigned integer.
--daddress=IPADDR
Set the destination address field to IPADDR for all records. IPADDR can be in canonical notation or
an unsigned integer.
--sport=NUM
Set the source port field to NUM for all records; a value between 0 and 65535.
--dport=NUM
Set the destination port field to NUM for all records; a value between 0 and 65535. (cf. Field
Constraints)
--protocol=NUM
Set the protocol field to NUM for all records; a value between 0 and 255.
--packets=NUM
Set the packets field to NUM for all records; the value must be non-zero.
--bytes=NUM
Set the bytes field to NUM for all records; the value must be non-zero.
--flags-all=TCPFLAGS
Set the TCP flags field to TCPFLAGS for all records. (cf. Field Constraints)
--stime=TIME
Set the start time field to TIME for all records.
--duration=NUM
Set the duration field to NUM for all records.
December 18, 2014
339
rwtuc(1)
The SiLK Reference Guide
--etime=TIME
Set the end time field to TIME for all records. (cf. Field Constraints)
--sensor=SID
Set the sensor field to SID for all records. This can either be a sensor name or sensor ID.
--input-index=NUM
Set the SNMP input index field to NUM for all records; a value between 0 and 65535.
--output-index=NUM
Set the SNMP output index field to NUM for all records; a value between 0 and 65535.
--next-hop-ip=IPADDR
Set the next-hop-ip field to IPADDR for all records. IPADDR can be in canonical notation or an
unsigned integer.
--flags-initial=TCPFLAGS
Set the initial TCP flags field to TCPFLAGS for all records. (cf. Field Constraints)
--flags-session=TCPFLAGS
Set the session TCP flags field to TCPFLAGS for all records. (cf. Field Constraints)
--attributes=ATTR
Set the attributes field to ATTR for all records.
--application=NUM
Set the application field to NUM for all records; a value between 0 and 65535.
--class=NAME
Set the class field to NAME for all records. (cf. Field Constraints)
--type=NAME
Set the type field to NAME for all records. (cf. Field Constraints)
--icmp-type=NUM
Set the ICMP type field to NUM for all ICMP or ICMPv6 flow records; a value between 0 and 255.
(cf. Field Constraints)
--icmp-code=NUM
Set the ICMP code field to NUM for all ICMP or ICMPv6 flow records; a value between 0 and 255.
(cf. Field Constraints)
--stime+msec=TIME
An alias for --stime. This switch is deprecated as of SiLK 3.6.0, and it will be removed in the SiLK
4.0 release.
--etime+msec=TIME
An alias for --etime. This switch is deprecated as of SiLK 3.6.0, and it will be removed in the SiLK
4.0 release.
--duration+msec=NUM
An alias for --duration. This is is deprecated as of SiLK 3.6.0, and it will be removed in the SiLK 4.0
release.
340
December 18, 2014
The SiLK Reference Guide
rwtuc(1)
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Using rwtuc to parse the output of rwcut(1) should produce the same output:
$ rwcut data.rw > cut.txt
$ md5 < cut.txt
7e3d693cd2cba2510803935274e1debd
$ rwtuc < cut.txt | rwcut | md5
7e3d693cd2cba2510803935274e1debd
To swap the source IP and port with the destination IP and port in flows.rw and save the result in reverse.rw :
$ rwcut --fields=dip,dport,sip,sport,5-15,20-29 flows.rw
| rwtuc --fields=1-15,20-29 --output-path=reverse.rw
\
rwtuc can be used to obfuscate the flow data in myflows.rw to produce obflows.rw. Pipe the output from
rwcut into a script that manipulates the IP addresses, then pipe that into rwtuc. Using the sed(1) script
in priv.sed, the invocation is:
$ rwcut --fields=1-10,13-15,26-29 myflows.rw
| sed -f priv.sed
| rwtuc --sensor=1 > obflows.rw
\
\
If the first line of input appears to contain titles, rwtuc will ignore it. In the first invocation below, rwtuc
treats SP as an abbreviation for sPort and ignores the line. Use the --no-titles switch to force rwtuc to
parse the line:
$ echo ’SP’ | rwtuc --fields=flags | rwcut --fields=flags
flags|
$
$ echo ’SP’ | rwtuc --fields=flags --no-titles | rwcut --fields=flags
flags|
S P
|
$
By default, rwtuc silently ignores lines that it cannot parse. Use the --verbose flag to see error messages:
$ echo sport | rwtuc --fields=flags --no-titles --verbose >/dev/null
rwtuc: stdin:1: Invalid flags ’sport’: Unexpected character ’o’
ENVIRONMENT
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
December 18, 2014
341
rwtuc(1)
The SiLK Reference Guide
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwtuc may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
rwtuc may use this environment variable. See the FILES section for details.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check
the Timezone support value in the output from rwtuc --version), the value of the TZ environment
variable determines the timezone in which rwtuc parses timestamps. If the TZ environment variable
is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be
parsed as UTC. The value of the TZ environment variable is ignored when the SiLK installation uses
utc. For system information on the TZ variable, see tzset(3).
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
SEE ALSO
rwcut(1), rwfileinfo(1), rwsiteinfo(1), sensor.conf(5), silk(7), yaf(1), sed(1), zlib(3)
342
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
rwuniq
Bin SiLK Flow records by a key and print each bin’s volume
SYNOPSIS
rwuniq --fields=KEY [--values=VALUES]
[--all-counts] [{--bytes | --bytes=MIN | --bytes=MIN-MAX}]
[{--packets | --packets=MIN | --packets=MIN-MAX}]
[{--flows | --flows=MIN | --flows=MIN-MAX}]
[--stime] [--etime]
[{--sip-distinct | --sip-distinct=MIN | --sip-distinct=MIN-MAX}]
[{--dip-distinct | --dip-distinct=MIN | --dip-distinct=MIN-MAX}]
[--presorted-input] [--sort-output]
[{--bin-time | --bin-time=SECONDS}]
[--timestamp-format=FORMAT] [--epoch-time]
[--ip-format=FORMAT] [--integer-ips] [--zero-pad-ips]
[--integer-sensors] [--integer-tcp-flags]
[--no-titles] [--no-columns] [--column-separator=CHAR]
[--no-final-delimiter] [{--delimited | --delimited=CHAR}]
[--print-filenames] [--copy-input=PATH] [--output-path=PATH]
[--pager=PAGER_PROG] [--temp-directory=DIR_PATH]
[{--legacy-timestamps | --legacy-timestamps={1,0}}]
[--ipv6-policy={ignore,asv4,mix,force,only}]
[--site-config-file=FILENAME]
[--plugin=PLUGIN [--plugin=PLUGIN ...]]
[--python-file=PATH [--python-file=PATH ...]]
[--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--pmap-column-width=NUM]
{[--xargs] | [--xargs=FILENAME] | [FILE [FILE ...]]}
rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help
rwuniq [--pmap-file=MAPNAME:PATH [--pmap-file=MAPNAME:PATH ...]]
[--plugin=PLUGIN ...] [--python-file=PATH ...] --help-fields
rwuniq --version
DESCRIPTION
rwuniq reads SiLK Flow records and groups them by a key composed of user-specified attributes of the
flows. For each group (or bin), a collection of user-specified aggregate values is computed; these values are
typically related to the volume of the bin, such as the sum of the bytes fields for all records that match the
key. Once all the SiLK Flow records are read, the key fields and the aggregate values are printed. For some
of the built-in aggregate values, it is possible to limit the output to the bins where the aggregate value meets
a user-specified minimum and/or maximum.
There is no need to sort the input to rwuniq since rwuniq normally rearranges the records as they are
read. To have rwuniq sort its output, use the --sort-output switch.
December 18, 2014
343
rwuniq(1)
The SiLK Reference Guide
rwuniq reads SiLK Flow records from the files named on the command line or from the standard input
when no file names are specified and --xargs is not present. To read the standard input in addition to the
named files, use - or stdin as a file name. If an input file name ends in .gz, the file will be uncompressed as
it is read. When the --xargs switch is provided, rwuniq will read the names of the files to process from the
named text file, or from the standard input if no file name argument is provided to the switch. The input
to --xargs must contain one file name per line.
The user must provide the --fields switch to select the flow attribute(s) (or field(s)) that comprise the key
for each bin. The available fields are similar to those supported by rwcut(1); see the description of the
--fields switch in the OPTIONS section below for the details. The list of fields can be extended by loading
PySiLK files (see silkpython(3)) or plug-ins (silk-plugin(3)). The fields will be printed in the order in
which they occur in the --fields switch. The size of the key is limited to 256 octets. A larger key will more
quickly use the available the memory leading to slower performance.
The aggregate value(s) to compute for each bin are also chosen by the user. As with the key fields, the
user can extend the list of aggregate fields by using PySiLK or plug-ins. The preferred way to specify the
aggregate fields is to use the --values switch; the aggregate fields will be printed in the order they occur
in the --values switch. The thresholding switches (e.g., --bytes) can also be used to specify the aggregate
values to compute. Aggregate values that are only specified with thresholding switches will be printed after
those that appear in --values, in the following order for backward compatibility: bytes, packets, flows,
stime, etime, sip-distinct, dip-distinct. If the user does not select any aggregate value(s), rwuniq defaults to
computing the number of flow records for each bin and printing all bins. As with the key fields, requesting
more aggregate values slows performance.
The --presorted-input switch may allow rwuniq to process data more efficiently by causing rwuniq to
assume the input has been previously sorted with the rwsort(1) command. With this switch, rwuniq does
not need large amounts of memory because it does not bin each flow; instead, it keeps a running summation
and outputs the bin whenever the key changes. For the output to be meaningful, rwsort and rwuniq must
be invoked with the same --fields value. When multiple input files are specified and --presorted-input is
given, rwuniq will merge-sort the flow records from the input files. rwuniq will usually run faster if you do
not include the --presorted-input switch when counting distinct IP addresses, even when reading sorted
input. Finally, you may get unusual results with --presorted-input when the --fields switch contains
multiple time-related key fields (sTime, duration, eTime), or when the time-related key is not the final key
listed in --fields; see the NOTES section for details.
rwuniq attempts to keep all key and aggregate value data in the computer’s memory. If rwuniq runs out
of memory, the current key and aggregate value data is written to a temporary file. Once all input has
been processed, the data from the temporary files is merged to produce the final output. By default, these
temporary files are stored in the /tmp directory. Because these files can be large, it is strongly recommended
that /tmp not be used as the temporary directory. To modify the temporary directory used by rwuniq,
provide the --temp-directory switch, set the SILK TMPDIR environment variable, or set the TMPDIR
environment variable.
rwuniq may run out of memory when computing distinct IP counts, causing the counts for some bins to be
smaller than the actual number of distinct IPs. When this occurs, a single warning is printed the standard
error noting that rwuniq has run out of memory, processing continues, and rwuniq exits with status 16.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option. A
parameter to an option may be specified as --arg=param or --arg param, though the first form is required
for options that take optional parameters.
The --fields switch is required. rwuniq will fail when it is not provided.
344
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
--fields=KEY
KEY contains the list of flow attributes (a.k.a. fields or columns) that make up the key into which
flows are binned. The columns will be displayed in the order the fields are specified. Each field may
be specified once only. KEY is a comma separated list of field-names, field-integers, and ranges of
field-integers; a range is specified by separating the start and end of the range with a hyphen (-).
Field-names are case insensitive. Example:
--fields=stime,10,1-5
There is no default value for the --fields switch; the switch must be specified.
The complete list of built-in fields that the SiLK tool suite supports follows, though note that not all
fields are present in all SiLK file formats; when a field is not present, its value is 0.
sIP,1
source IP address
dIP,2
destination IP address
sPort,3
source port for TCP and UDP, or equivalent
dPort,4
destination port for TCP and UDP, or equivalent. See note at iType.
protocol,5
IP protocol
packets,pkts,6
packet count
bytes,7
byte count
flags,8
bit-wise OR of TCP flags over all packets
sTime,9
starting time of flow (seconds resolution). When the time-related fields sTime,duration,eTime
are all in use, rwuniq will ignore the final time field when binning the records.
duration,10
duration of flow (seconds resolution). See note at sTime,9.
eTime,11
end time of flow (seconds resolution). See note at sTime,9.
sensor,12
name or ID of the sensor where the flow was collected
class,20
class assigned to the flow by rwflowpack(8). Binning by class and/or type equates to binning
by the integer value used internally to represent the class/type pair. When --fields contains
class but not type, rwuniq’s output will have multiple rows with the same value(s) for the key
field(s).
type,21
type assigned to the flow by rwflowpack(8). See note on previous entry.
December 18, 2014
345
rwuniq(1)
The SiLK Reference Guide
iType
the ICMP type value for ICMP or ICMPv6 flows and empty (numerically zero) for non-ICMP
flows. Internally, SiLK stores the ICMP type and code in the dPort field. To avoid getting very
odd results, either do not use the dPort field when your key includes ICMP field(s) or be certain
to include the protocol field as part of your key. This field was introduced in SiLK 3.8.1.
iCode
the ICMP code value for ICMP or ICMPv6 flows and empty for non-ICMP flows. See note at
iType.
icmpTypeCode,25
equivalent to iType,iCode when used in --fields. This field may not be mixed with iType or
iCode, and this field is deprecated as of SiLK 3.8.1. As of SiLK 3.8.1, icmpTypeCode may no
longer be used as the argument to the Distinct: value field; the dPort field will provide an
equivalent result as long as the input is limited to ICMP flow records.
Many SiLK file formats do not store the following fields and their values will always be 0; they are
listed here for completeness:
in,13
router SNMP input interface or vlanId if packing tools were configured to capture it (see sensor.conf(5))
out,14
router SNMP output interface or postVlanId
nhIP,15
router next hop IP
SiLK can store flows generated by enhanced collection software that provides more information than
NetFlow v5. These flows may support some or all of these additional fields; for flows without this
additional information, the field’s value is always 0.
initialFlags,26
TCP flags on first packet in the flow
sessionFlags,27
bit-wise OR of TCP flags over all packets except the first in the flow
attributes,28
flow attributes set by the flow generator:
S
all the packets in this flow record are exactly the same size
F
flow generator saw additional packets in this flow following a packet with a FIN flag (excluding
ACK packets)
T
flow generator prematurely created a record for a long-running connection due to a timeout.
(When the flow generator yaf(1) is run with the --silk switch, it will prematurely create a
flow and mark it with T if the byte count of the flow cannot be stored in a 32-bit value.)
C
flow generator created this flow as a continuation of long-running connection, where the
previous flow for this connection met a timeout (or a byte threshold in the case of yaf ).
346
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
Consider a long-running ssh session that exceeds the flow generator’s active timeout. (This is the
active timeout since the flow generator creates a flow for a connection that still has activity). The
flow generator will create multiple flow records for this ssh session, each spanning some portion of
the total session. The first flow record will be marked with a T indicating that it hit the timeout.
The second through next-to-last records will be marked with TC indicating that this flow both
timed out and is a continuation of a flow that timed out. The final flow will be marked with a C,
indicating that it was created as a continuation of an active flow.
application,29
guess as to the content of the flow. Some software that generates flow records from packet data,
such as yaf, will inspect the contents of the packets that make up a flow and use traffic signatures
to label the content of the flow. SiLK calls this label the application; yaf refers to it as the
appLabel. The application is the port number that is traditionally used for that type of traffic
(see the /etc/services file on most UNIX systems). For example, traffic that the flow generator
recognizes as FTP will have a value of 21, even if that traffic is being routed through the standard
HTTP/web port (80).
The following fields provide a way to label the IPs or ports on a record. These fields require external
files to provide the mapping from the IP or port to the label:
sType,16
for the source IP address, the value 0 if the address is non-routable, 1 if it is internal, or 2
if it is routable and external. Uses the mapping file specified by the SILK ADDRESS TYPES
environment variable, or the address types.pmap mapping file, as described in addrtype(3).
dType,17
as sType for the destination IP address
scc,18
for the source IP address, a two-letter country code abbreviation denoting the country where
that IP address is located. Uses the mapping file specified by the SILK COUNTRY CODES
environment variable, or the country codes.pmap mapping file, as described in ccfilter(3). The
abbreviations are those used by the Root-Zone Whois Index (see for example http://www.iana.
org/cctld/cctld-whois.htm) or the following special codes: -- N/A (e.g. private and experimental
reserved addresses); a1 anonymous proxy; a2 satellite provider; o1 other
dcc,19
as scc for the destination IP
src-MAPNAME
label determined by passing the source IP or the protocol/source-port to the user-defined mapping
defined in the prefix map associated with MAPNAME. See the description of the --pmap-file
switch below and the pmapfilter(3) manual page.
dst-MAPNAME
as src-MAPNAME for the destination IP or protocol/destination-port.
sval
dval
These are deprecated field names created by pmapfilter that correspond to src-MAPNAME
and dst-MAPNAME , respectively. These fields are available when a prefix map is used that is
not associated with a MAPNAME.
Finally, the list of built-in fields may be augmented by the run-time loading of PySiLK code or plug-ins
written in C (also called shared object files or dynamic libraries), as described by the --python-file
and --plugin switches.
December 18, 2014
347
rwuniq(1)
The SiLK Reference Guide
--values=VALUES
Specify the aggregate values to compute for each bin as a comma separated list of names. Names
are case insensitive. When a thresholding switch specifies an aggregate value field that does appear in
VALUES, that field is added to end of VALUES. When neither the --values switch nor any thresholding
switch is specified, rwuniq counts the number of flow records for each bin. The aggregate fields are
printed in the order they occur in VALUES. The names of the built-in value fields follow. This list can
be augmented through the use of PySiLK and plug-ins.
Records
Count the number of flow records that mapped to each bin.
Packets
Sum the number of packets across all records that mapped to each bin.
Bytes
Sum the number of bytes across all records that mapped to each bin.
sTime-Earliest
eTime-Latest
sIP-Distinct
Count the number of distinct source IP addresses that were seen for each bin.
dIP-Distinct
Count the number of distinct destination IP addresses that were seen for each bin.
Distinct:KEY FIELD
Count the number of distinct values for KEY FIELD, where KEY FIELD is any field that can
be used as an argument to --fields except icmpTypeCode. For example, Distinct:sPort will
count the number of distinct source ports for each bin. When this aggregate value field is used,
the specified KEY FIELD cannot be present in the argument to --fields.
--plugin=PLUGIN
Augment the list of key fields and/or aggregate value fields by using run-time loading of the plug-in
(shared object) whose path is PLUGIN. The switch may be repeated to load multiple plug-ins. The
creation of plug-ins is described in the silk-plugin(3) manual page. When PLUGIN does not contain
a slash (/), rwuniq will attempt to find a file named PLUGIN in the directories listed in the FILES
section. If rwuniq finds the file, it uses that path. If PLUGIN contains a slash or if rwuniq does
not find the file, rwuniq relies on your operating system’s dlopen(3) call to find the file. When
the SILK PLUGIN DEBUG environment variable is non-empty, rwuniq prints status messages to the
standard error as it attempts to find and open each of its plug-ins.
The next eight options will add the appropriate aggregate field to --values if the field is not present. The
options are processed in the order they appear here, regardless of the order they occur on the command line.
Use of these switches without a threshold value is deprecated.
--all-counts
Enable the next five sets of options with their default thresholds; i.e., all possible counts (except the
distinct counts) are computed and printed. This switch is deprecated.
--bytes
--bytes=MIN
348
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
--bytes=MIN -MAX
Cause rwuniq to total, for each unique key, the number of bytes in each flow record. When MIN is
provided, bins are printed only when they had at least MIN total bytes. When MAX is also provided,
bins are printed only when they had no more than MAX total bytes. A MIN of 0 is treated as 1.
When MIN is not provided, a default of 1 is used.
--packets
--packets=MIN
--packets=MIN -MAX
Cause rwuniq to sum, for each unique key, the number of packets in each flow record. When MIN
is provided, bins are printed only when they had at least MIN sum of packets. When MAX is also
provided, bins are printed only when they had no more than MAX sum of packets. A MIN of 0 is
treated as 1. When MIN is not provided, a default of 1 is used.
--flows
--flows=MIN
--flows=MIN -MAX
Cause rwuniq to sum the number of flow records in each uniquely keyed bin. When MIN is provided,
bins are printed only when they had at least MIN number of flows. When MAX is also provided, bins
are printed only when they had no more than MAX flows. A MIN of 0 is treated as 1. When MIN is
not provided, a default of 1 is used.
--stime
Cause rwuniq to keep track of the earliest time at which it saw a flow that matched each bin’s unique
key. This option does not support thresholds, and it is deprecated.
--etime
Cause rwuniq to keep track of the latest (most recent) time at which it saw a flow that matched each
bin’s unique key. This option does not support thresholds, and it is deprecated.
--sip-distinct
--sip-distinct=MIN
--sip-distinct=MIN -MAX
Cause rwuniq to count the number of distinct source IP addresses that were seen for each uniquely
keyed bin. When MIN is provided, bins are printed only when they had at least MIN distinct sources.
When MAX is also provided, bins are printed only when they had no more than MAX distinct sources.
A MIN of 0 is treated as 1. When MIN is not provided, a default of 1 is used. When this switch is
provided, the sIP field cannot be part of the key.
--dip-distinct
--dip-distinct=MIN
--dip-distinct=MIN -MAX
As --sip-distinct for destination IP addresses.
Miscellaneous options:
December 18, 2014
349
rwuniq(1)
The SiLK Reference Guide
--presorted-input
Cause rwuniq to assume that it is reading sorted input; i.e., that rwuniq’s input file(s) were generated
by rwsort(1) using the exact same value for the --fields switch. This option allows rwuniq to process
an endless stream of records. When multiple input files are specified, rwuniq will merge-sort the flow
records from the input files. See the NOTES section for issues that may occur when using --presortedinput.
--sort-output
Cause rwuniq to present the output in sorted numerical order. The key rwuniq uses for sorting is
the same key it uses to index each bin.
--bin-time
--bin-time=SECONDS
Adjust the key fields ’sTime’ and ’eTime’ to appear on SECONDS -second boundaries (the floor of the
time is used). When no value is provided to the switch, 60-second time bins are used. (When the
start-time is the only key field and time binning is desired, consider using rwcount(1) instead.)
--timestamp-format=FORMAT
Specify how timestamps will be printed. When this switch is not specified, timestamps are printed in
the default format, and the timezone is UTC unless SiLK was compiled with local timezone support.
FORMAT is a comma-separated list of a format and/or a timezone. The format is one of:
default
Print the timestamps as YYYY /MM /DD Thh :mm :ss .
iso
Print the timestamps as YYYY -MM -DD hh :mm :ss .
m/d/y
Print the timestamps as MM /DD /YYYY hh :mm :ss .
epoch
Print the timestamps as the number of seconds since 00:00:00 UTC on 1970-01-01.
When a timezone is specified, it is used regardless of the default timezone support compiled into SiLK.
The timezone is one of:
utc
Use Coordinated Universal Time to print timestamps.
local
Use the TZ environment variable or the local timezone.
--epoch-time
Print timestamps as epoch time (number of seconds since midnight GMT on 1970-01-01). This switch
is equivalent to --timestamp-format=epoch, it is deprecated as of SiLK 3.0.0, and it will be removed
in the SiLK 4.0 release.
--ip-format=FORMAT
Specify how IP addresses will be printed. When this switch is not specified, IPs are printed in the
canonical format. The FORMAT is one of:
350
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
canonical
Print IP addresses in their canonical form: dotted quad for IPv4 (127.0.0.1) and hexadectet for
IPv6 (2001:db8::1). Note that IPv6 addresses in ::ffff:0:0/96 and some IPv6 addresses in ::/96
will be printed as a mixture of IPv6 and IPv4.
zero-padded
Print IP addresses in their canonical form, but add zeros to the output so it fully fills the width
of column. The addresses 127.0.0.1 and 2001:db8::1 are printed as 127.000.000.001 and
2001:0db8:0000:0000:0000:0000:0000:0001, respectively. When the --ipv6-policy is force,
the output for 127.0.0.1 becomes 0000:0000:0000:0000:0000:ffff:7f00:0001.
decimal
Print IP addresses as integers in decimal format. The addresses 127.0.0.1 and 2001:db8::1 are
printed as 2130706433 and 42540766411282592856903984951653826561, respectively.
hexadecimal
Print IP addresses as integers in hexadecimal format. The addresses 127.0.0.1 and 2001:db8::1
are printed as 7f000001 and 20010db8000000000000000000000001, respectively.
force-ipv6
Print all IP addresses in the canonical form for IPv6 without using any IPv4 notation. Any IPv4
address is mapped into the ::ffff:0:0/96 netblock. The addresses 127.0.0.1 and 2001:db8::1 are
printed as ::ffff:7f00:1 and 2001:db8::1, respectively.
--integer-ips
Print IP addresses as integers. This switch is equivalent to --ip-format=decimal, it is deprecated as
of SiLK 3.7.0, and it will be removed in the SiLK 4.0 release.
--zero-pad-ips
Print IP addresses as fully-expanded, zero-padded values in their canonical form. This switch is
equivalent to --ip-format=zero-padded, it is deprecated as of SiLK 3.7.0, and it will be removed in
the SiLK 4.0 release.
--integer-sensors
Print the integer ID of the sensor rather than its name.
--integer-tcp-flags
Print the TCP flag fields (flags, initialFlags, sessionFlags) as an integer value. Typically, the characters
F,S,R,P,A,U,E,C are used to represent the TCP flags.
--no-titles
Turn off column titles. By default, titles are printed.
--no-columns
Disable fixed-width columnar output.
--column-separator=C
Use specified character between columns and after the final column. When this switch is not specified,
the default of ’|’ is used.
--no-final-delimiter
Do not print the column separator after the final column. Normally a delimiter is printed.
--delimited
December 18, 2014
351
rwuniq(1)
The SiLK Reference Guide
--delimited=C
Run as if --no-columns --no-final-delimiter --column-sep=C had been specified. That is, disable
fixed-width columnar output; if character C is provided, it is used as the delimiter between columns
instead of the default ’|’.
--print-filenames
Prints to the standard error the names of input files as they are opened.
--copy-input=PATH
Copy all binary input to the specified file or named pipe. PATH can be stdout to print flows to
the standard output as long as the --output-path switch has been used to redirect rwuniq’s ASCII
output.
--output-path=PATH
Determines where the output of rwuniq (ASCII text) is written. If this option is not given, output is
written to the standard output.
--pager=PAGER PROG
When output is to a terminal, invoke the program PAGER PROG to view the output one screen full
at a time. This switch overrides the SILK PAGER environment variable, which in turn overrides the
PAGER variable. If the value of the pager is determined to be the empty string, no paging will be
performed and all output will be printed to the terminal.
--ipv6-policy=POLICY
Determine how IPv4 and IPv6 flows are handled when SiLK has been compiled with IPv6 support.
When the switch is not provided, the SILK IPV6 POLICY environment variable is checked for a policy.
If it is also unset or contains an invalid policy, the POLICY is mix. When SiLK has not been compiled
with IPv6 support, IPv6 flows are always ignored, regardless of the value passed to this switch or in
the SILK IPV6 POLICY variable. The supported values for POLICY are:
ignore
Ignore any flow record marked as IPv6, regardless of the IP addresses it contains.
asv4
Convert IPv6 flow records that contain addresses in the ::ffff:0:0/96 prefix to IPv4 and ignore all
other IPv6 flow records.
mix
Process the input as a mixture of IPv4 and IPv6 flow records. When an IP address is used as
part of the key or value, this policy is equivalent to force.
force
Convert IPv4 flow records to IPv6, mapping the IPv4 addresses into the ::ffff:0:0/96 prefix.
only
Process only flow records that are marked as IPv6 and ignore IPv4 flow records in the input.
--temp-directory=DIR PATH
Specify the name of the directory in which to store data files temporarily when the memory is not large
enough to store all the bins and their aggregate values. This switch overrides the directory specified
in the SILK TMPDIR environment variable, which overrides the directory specified in the TMPDIR
variable, which overrides the default, /tmp.
352
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
--site-config-file=FILENAME
Read the SiLK site configuration from the named file FILENAME. When this switch is not provided,
rwuniq searches for the site configuration file in the locations specified in the FILES section.
--legacy-timestamps
--legacy-timestamps=NUM
When NUM is not specified or is 1, this switch is equivalent to --timestamp-format=m/d/y.
Otherwise, the switch has no effect. This switch is deprecated as of SiLK 3.0.0, and it will be removed
in the SiLK 4.0 release.
--xargs
--xargs=FILENAME
Causes rwuniq to read file names from FILENAME or from the standard input if FILENAME is not
provided. The input should have one file name per line. rwuniq will open each file in turn and read
records from it, as if the files had been listed on the command line.
--help
Print the available options and exit. Specifying switches that add new fields, values, or additional
switches before --help will allow the output to include descriptions of those fields or switches.
--help-fields
Print the description and alias(es) of each field and value and exit. Specifying switches that add new
fields before --help-fields will allow the output to include descriptions of those fields.
--version
Print the version number and information about how SiLK was configured, then exit the application.
--pmap-file=MAPNAME :PATH
--pmap-file=PATH
Instruct rwuniq to load the mapping file located at PATH and create the src-MAPNAME and dstMAPNAME fields. When MAPNAME is provided explicitly, it will be used to refer to the fields
specific to that prefix map. If MAPNAME is not provided, rwuniq will check the prefix map file
to see if a map-name was specified when the file was created. If no map-name is available, rwuniq
creates the fields sval and dval. Multiple --pmap-file switches are supported as long as each uses a
unique value for map-name. The --pmap-file switch(es) must precede the --fields switch. For more
information, see pmapfilter(3).
--pmap-column-width=NUM
When printing a label associated with a prefix map, this switch gives the maximum number of characters to use when displaying the textual value of the field.
--python-file=PATH
When the SiLK Python plug-in is used, rwuniq reads the Python code from the file PATH to define
additional fields that can be used as part of the key or as an aggregate value. This file should call
register field() for each field it wishes to define. For details and examples, see the silkpython(3)
and pysilk(3) manual pages.
December 18, 2014
353
rwuniq(1)
The SiLK Reference Guide
EXAMPLES
In these examples, the dollar sign ($) represents the shell prompt and a backslash (\) is used to continue a
line for better readability. Many examples assume previous rwfilter(1) commands have written data files
named data.rw and data-v6.rw.
Print the byte-, packet-, and record-counts for each protocol, sorting the results by protocol (to sort by the
volume, use rwstats(1)):
$ rwuniq --fields=proto --values=bytes,packets,records --sort data.rw
pro|
Bytes|
Packets|
Records|
1|
5344836|
73473|
7801|
6|
59945492930|
72127917|
165363|
17|
17553593|
77764|
77764|
Print the number of records seen for each source port:
$ rwuniq --fields=sport data.rw | head
sPort|
Records|
29485|
45|
29055|
31|
26373|
7|
28149|
17|
28171|
21|
28413|
39|
25836|
3|
28376|
7|
23847|
1|
Print the number of records seen for each source port for ports having more than 1000 records:
$ rwuniq --fields=sport --flows=1000 data.rw
sPort|
Records|
25|
15568|
67|
7807|
80|
27044|
53|
62216|
22|
27994|
8080|
3946|
443|
7917|
123|
7741|
0|
7801|
Print the source addresses that sent more than 10,000,000 bytes:
$ rwuniq --fields=sip --bytes=10000000 data-v6.rw
sIP|
Bytes|
2001:db8:a:fd::90:bd|
14529210|
For source addresses that sent more than 10,000,000 bytes, print the number of unique destination hosts it
contacted:
354
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
$ rwuniq --fields=sip --values=bytes,distinct:dip data-v6.rw
sIP|
Bytes|dIP-Distin|
2001:db8:a:fd::90:bd|
14529210|
2|
Print the number of bytes that host shared with each destination (first use rwfilter to limit the input to
that host):
$ rwfilter --saddr=2001:db8:a:fd::90:bd --pass=- data-v6.rw
| rwuniq --fields=sip --values=bytes
dIP|
Bytes|
2001:db8:c0:a8::fa:5d|
7097847|
2001:db8:c0:a8::dd:6|
7431363|
\
Print the packet and byte counts for each source-destination IP pair, where the prefix length is 16 (use
rwnetmask(1) on the input to rwuniq):
$ rwnetmask --4sip-prefix=16 --4dip-prefix=16 data.rw
| rwuniq --fields=sip,dip --values=packet,byte | head
sIP|
dIP| Packets|
Bytes|
10.139.0.0|
192.168.0.0|
33490|
22950353|
10.40.0.0|
192.168.0.0|
258|
18544|
10.204.0.0|
192.168.0.0|
353233|
288736424|
10.106.0.0|
192.168.0.0|
13051|
3843693|
10.71.0.0|
192.168.0.0|
4355|
1391194|
10.98.0.0|
192.168.0.0|
7312|
7328359|
10.114.0.0|
192.168.0.0|
2538|
4137927|
10.168.0.0|
192.168.0.0|
92094|
86883062|
10.176.0.0|
192.168.0.0|
122101|
116555051|
\
Print the source of TCP traffic with no more than 3 packets and which also appears at least 4 times (use
rwfilter on the input):
$ rwfilter --proto=6 --packets=1-3 --pass=- data.rw
| rwuniq --field=sip --flows=4 | head -5
sIP|
Records|
10.147.252.145|
256|
10.103.144.78|
256|
10.117.142.175|
256|
10.41.221.170|
256|
\
The silkpython(3) manual page provides examples that use PySiLK to create arbitrary fields to use as part
of the key for rwuniq.
ENVIRONMENT
SILK IPV6 POLICY
This environment variable is used as the value for the --ipv6-policy when that switch is not provided.
December 18, 2014
355
rwuniq(1)
The SiLK Reference Guide
SILK PAGER
When set to a non-empty string, rwuniq automatically invokes this program to display its output a
screen at a time. If set to an empty string, rwuniq does not automatically page its output.
PAGER
When set and SILK PAGER is not set, rwuniq automatically invokes this program to display its
output a screen at a time.
SILK TMPDIR
When set and --temp-directory is not specified, rwuniq writes the temporary files it creates to this
directory. SILK TMPDIR overrides the value of TMPDIR.
TMPDIR
When set and SILK TMPDIR is not set, rwuniq writes the temporary files it creates to this directory.
PYTHONPATH
This environment variable is used by Python to locate modules. When --python-file is specified,
rwuniq loads Python which in turn loads the PySiLK module which is comprised of several files
(silk/pysilk nl.so, silk/ init .py, etc). If this silk/ directory is located outside Python’s normal search
path (for example, in the SiLK installation tree), it may be necessary to set or modify the PYTHONPATH environment variable to include the parent directory of silk/ so that Python can find the PySiLK
module.
SILK PYTHON TRACEBACK
When set, Python plug-ins will output traceback information on Python errors to the standard error.
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that rwuniq uses
when computing the scc and dcc fields. The value may be a complete path or a file relative to the
SILK PATH. See the FILES section for standard locations of this file.
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file that rwuniq uses
when computing the sType and dType fields. The value may be a complete path or a file relative to
the SILK PATH. See the FILES section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
SILK CONFIG FILE
This environment variable is used as the value for the --site-config-file when that switch is not
provided.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, rwuniq may use this environment variable when searching for the SiLK site configuration file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwuniq may use this environment variable. See the FILES section for details.
356
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check
the Timezone support value in the output from rwuniq --version), the value of the TZ environment
variable determines the timezone in which rwuniq displays timestamps. If the TZ environment variable
is not set, the default timezone is used. Setting TZ to 0 or the empty string causes timestamps to be
displayed in UTC. The value of the TZ environment variable is ignored when the SiLK installation
uses utc. For system information on the TZ variable, see tzset(3).
SILK PLUGIN DEBUG
When set to 1, rwuniq prints status messages to the standard error as it attempts to find and open
each of its plug-ins. In addition, when an attempt to register a field fails, rwuniq prints a message
specifying the additional function(s) that must be defined to register the field in rwuniq. Be aware
that the output can be rather verbose.
SILK TEMPFILE DEBUG
When set to 1, rwuniq prints debugging messages to the standard error as it creates, re-opens, and
removes temporary files.
SILK UNIQUE DEBUG
When set to 1, the binning engine used by rwuniq prints debugging messages to the standard error.
FILES
${SILK ADDRESS TYPES}
${SILK PATH}/share/silk/address types.pmap
${SILK PATH}/share/address types.pmap
/usr/local/share/silk/address types.pmap
/usr/local/share/address types.pmap
Possible locations for the address types mapping file required by the sType and dType fields.
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
December 18, 2014
357
rwuniq(1)
The SiLK Reference Guide
/usr/local/share/silk/country codes.pmap
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file required by the scc and dcc fields.
${SILK PATH}/lib64/silk/
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/
Directories that rwuniq checks when attempting to load a plug-in.
${SILK TMPDIR}/
${TMPDIR}/
/tmp/
Directory in which to create temporary files.
NOTES
If multiple thresholds are given (e.g., --bytes=80 --flows=2), the values must meet all thresholds before
the record is printed. For example, if a given key saw a single 100-byte flow, the entry would not printed
given the switches above.
rwuniq functionally replaces the combination of
rwcut | sort | uniq -c
To get a list of unique IP addresses in a data set without the counting or threshold abilities of rwuniq,
consider using the IPset tools for improved performance:
rwset --sip-set=stdout | rwsetcat --print-ips
For situations where the key and value are each a single field, the Bag tools usually provide better performance, especially when the key is one or two bytes:
rwbag --sport-bytes=stdout | rwbagcat
358
December 18, 2014
The SiLK Reference Guide
rwuniq(1)
rwgroup(1) works similarly to rwuniq, except the data remains in the form of SiLK Flow records, and the
next-hop-IP field is modified to denote the records that form a bin.
rwstats(1) can do the same binning as rwuniq, and then sort the data by an aggregate field.
When the --bin-time switch is given and the three time fields (starting-time (sTime), ending-time (eTime),
and duration (duration)) are present in the key, the duration field’s value will be modified to be the difference
between the ending and starting times.
When the three time-related key fields (sTime,duration,eTime) are all in use, rwuniq will ignore the final
time field when binning the records, but the field will appear in the output. Due to truncation of the
milliseconds values, rwuniq will print a different number of rows depending on the order in which those
three values appear in the --fields switch.
rwuniq supports counting distinct source and/or destination IPs. To see the number of distinct sources for
each 10 minute bin, run:
rwuniq --fields=stime --values=sip-distinct --bin-time=600 --sort-output
When computing distinct counts over a field, the field may not be part of the key; that is, you cannot have
--fields=sip --values=sip-distinct.
Using the --presorted-input switch sometimes introduces more issues than it solves, and --presortedinput is less necessary now that rwuniq can use temporary files while processing input.
When using the --presorted-input switch, it is highly recommended that you use no more than one timerelated key field (sTime, duration, eTime) in the --fields switch and that the time-related key appear last
in --fields. The issue is caused by rwsort considering the millisecond values on the times when sorting,
while rwuniq truncates the millisecond value. The result may be unsorted output and multiple rows in the
output that have the same values for the key fields:
$ rwsort --fields=stime,duration data.rw
| rwuniq --fields=stime,dur --presorted
sTime|durat|
Records|
...
2009/02/12T00:00:57|
0|
2|
2009/02/12T00:00:57|
29|
2|
2009/02/12T00:00:57|
0|
2|
2009/02/12T00:00:57|
13|
2|
...
\
When computing distinct IP counts, rwuniq will typically run faster if you do not use the --presortedinput switch, even if the data was previously sorted.
rwuniq may run out of memory when computing distinct IP counts, causing the counts for some bins to be
smaller than the actual number of distinct IPs. When this occurs, a single warning is printed the standard
error noting that rwuniq has run out of memory, processing continues, and rwuniq exits with status 16.
rwuniq’s strength is its ability to build arbitrary keys and aggregate fields. For a key of a single IP address,
see rwaddrcount(1) and rwbag(1); for a key made up of a single CIDR block (/8, /16, /24 only), a single
port, or a single protocol, use rwtotal(1) or rwbag(1).
December 18, 2014
359
rwuniq(1)
The SiLK Reference Guide
SEE ALSO
rwfilter(1), rwbag(1), rwcut(1), rwset(1), rwsetcat(1), rwaddrcount(1), rwgroup(1), rwstats(1), rwnetmask(1), rwsort(1), rwtotal(1), rwcount(1), addrtype(3), ccfilter(3), pmapfilter(3), pysilk(3), silkpython(3), silk-plugin(3), sensor.conf(5), rwflowpack(8), silk(7), yaf(1),
dlopen(3)
360
December 18, 2014
The SiLK Reference Guide
silk config(1)
silk config
Print SiLK compiling and linking information
SYNOPSIS
silk_config [--silk-version] [--compiler] [--cflags] [--include]
[--libs] [--libsilk-libs] [--libsilk-thrd-libs]
[--libflowsource-libs] [--data-rootdir] [--python-site-dir]
silk_config --help
silk_config --version
DESCRIPTION
silk config prints configuration information used to compile and link other files and programs against
the SiLK header files and libraries. silk config will print the output value(s) selected by the user, or all
configuration information if no switches are provided.
This command has nothing to do with the SiLK Configuration file. See the silk.conf(5) manual page for
information on that file.
OPTIONS
Option names may be abbreviated if the abbreviation is unique or is an exact match for an option.
--silk-version
Print the version of SiLK as a simple string. This output from this switch is only the version number; the
output does not include the additional configuration information that the --version switch normally
prints.
--compiler
Print the compiler used to build SiLK.
--cflags
Print the include paths (that is, the -I switches) and any additional compiler flags to use when compiling
a file against the SiLK header files. To only print the include paths, use --include.
--include
Print the include paths to use when compiling a file against the SiLK header files. See also --cflags.
--libs
This switch is an alias for --libsilk-libs.
--libsilk-libs
Print the linker flags (that is, the -L and -l switches) to use when linking a program against libsilk.so.
December 18, 2014
361
silk config(1)
The SiLK Reference Guide
--libsilk-thrd-libs
Print the linker flags to use when linking a program against libsilk-thrd.so. Few external programs will
need to use this library.
--libflowsource-libs
Print the linker flags to use when linking a program against libflowsource.so. It is highly unlikely
that an external program will need to use this library.
--data-rootdir
Print the compiled-in value of the default location of the SiLK data repository, ignoring any environment variable settings.
--python-site-dir
Print the name of the directory containing the silk subdirectory where the PySiLK module files were
installed. The user may need to set the PYTHONPATH environment variable to this location to be
able to use PySiLK. The value will be empty if PySiLK support is not available in this build of SiLK.
--help
Print the available options and exit.
--version
Print the version number and information about how SiLK was configured, then exit the application.
SEE ALSO
silk.conf(5), silk(7)
362
December 18, 2014
3
SiLK Libraries and Plug-Ins
The behavior of several SiLK tools can be augmented by built-in libraries or plug-ins loaded at run time;
this section describes those libraries and plug-ins.
363
addrtype(3)
The SiLK Reference Guide
addrtype
Labeling IPv4 addresses as internal or external
SYNOPSIS
rwfilter [--stype=ID] [--dtype=ID] ...
rwcut --fields=sType,dType ...
rwgroup --id-fields=sType,dType ...
rwsort --fields=sType,dType ...
rwstats --fields=sType,dType ...
rwuniq --fields=sType,dType ...
DESCRIPTION
The address type mapping file provides a way to map an IPv4 address to an integer denoting the IP as internal, external, or non-routable. With this mapping file, SiLK flow records can be partitioned (rwfilter(1)),
displayed (rwcut(1)), grouped (rwgroup(1)), sorted (rwsort(1)), and counted (rwstats(1) and rwuniq(1)) by the characteristic of the address.
The address type is a specialized form of the Prefix Map, pmapfilter(3), where the following labels are
assumed to exist and to have the indicated values:
0
denotes a (non-routable) IP address
1
denotes an IP address internal to the monitored network
2
denotes an IP address external to the monitored network
The SiLK tools look for the address type mapping file in a standard location as detailed in the FILES section
below. To provide an alternate location, specify that location in the SILK ADDRESS TYPES environment
variable.
Creating the prefix map file that maps IPs to one of these labels is described in the MAPPING FILE section
below.
OPTIONS
The address type utility provides the following options to the indicated applications.
364
December 18, 2014
The SiLK Reference Guide
addrtype(3)
rwfilter Switches
--stype=ID
When ID is 0, pass the record if its source address is non-routable. When ID is 1, pass the record if its
source address is internal. When ID is 2, pass the record if its source address is external (i.e., routable
and not internal). When ID is 3, pass the record if its source address is not internal (non-routable or
external).
--dtype=ID
As --stype for the destination IP address.
rwcut, rwgroup, rwsort, rwstats, and rwuniq Switches
--fields=FIELDS
FIELDS refers to a list of fields to use for the operation. The address type utility makes two additional
fields, sType (alias 16) and dType (17) available for display, grouping, sorting, and counting using the
rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1) tools:
sType,16
For the source IP address, prints 0 if the address is non-routable, 1 if it is internal, or 2 if it is
routable and external.
dType,17
as sType, except for the destination address
MAPPING FILE
To denote an address as non-routable, internal, or external at your site, you will need to create the
address types.pmap file and either install it in the appropriate location (see the FILES section below) or set
the SILK ADDRESS TYPES environment variable to the file’s location.
The rwpmapbuild(1) tool creates a prefix map file from a text file. A template for the text file is available in
$SILK PATH/share/silk/addrtype-templ.txt. The text file used to create address types.pmap must include
the following section to ensure that IPs are mapped to the integer values that the addrtype.so expects:
#
Numerical mappings of labels
label 0
label 1
label 2
#
non-routable
internal
external
Default to "external" for all un-defined ranges.
default
external
The remainder of the file can list CIDR blocks and a label for each block:
# RFC1918 space
10.0.0.0/8
172.16.0.0/12
192.168.0.0/16
December 18, 2014
non-routable
non-routable
non-routable
365
addrtype(3)
The SiLK Reference Guide
# My IP space (CMU)
128.2.0.0/16
internal
Once the text file is saved to disk, use rwpmapbuild to create address types.pmap:
rwpmapbuild --input addresses.txt --output address_types.pmap
ENVIRONMENT
SILK ADDRESS TYPES
This environment variable allows the user to specify the address type mapping file to use. The value
may be a complete path or a file relative to SILK PATH. If the variable is not specified, the code looks
for a file named address types.pmap as specified in the FILES section below.
SILK PATH
This environment variable gives the root of the install tree. The SiLK applications check the directories $SILK PATH/share/silk and $SILK PATH/share for the address type mapping file, address types.pmap.
FILES
The tools will look for the data file that maps IPs to labels in the following locations.
($SILK ADDRESS TYPES is the value of the SILK ADDRESS TYPES environment variable, if it is set.
$SILK PATH is value of the SILK PATH environment variable, if it is set. The use of /usr/local/ assumes
the application is installed in the /usr/local/bin/ directory.)
$SILK_ADDRESS_TYPES
$SILK_PATH/share/silk/address_types.pmap
$SILK_PATH/share/address_types.pmap
/usr/local/share/silk/address_types.pmap
/usr/local/share/address_types.pmap
SEE ALSO
rwcut(1), rwfilter(1), rwgroup(1), rwpmapbuild(1), rwpmapcat(1), rwsort(1), rwstats(1), rwuniq(1), pmapfilter(3), silk(7)
366
December 18, 2014
The SiLK Reference Guide
ccfilter(3)
ccfilter
Mapping IPv4 addresses to country codes
SYNOPSIS
rwfilter [--scc=COUNTRY_CODES] [--dcc=COUNTRY_CODES] ...
rwcut --fields=scc,dcc ...
rwgroup --id-fields=scc,dcc ...
rwsort --fields=scc,dcc ...
rwstats --fields=scc,dcc ...
rwuniq --fields=scc,dcc ...
rwpmaplookup --country-codes ...
DESCRIPTION
The country code mapping file provides a mapping from an IPv4 address to two-letter, lowercase abbreviation
of the country where that IP address is located. The mapping file allows the country code value of IP
addresses on a SiLK Flow record to be partitioned (rwfilter(1)), displayed (rwcut(1)), sorted (rwsort(1)),
grouped (rwgroup(1)), and counted (rwstats(1) and rwuniq(1)).
The rwpmaplookup(1) tool, when invoked with the --country-codes switch, accepts textual input and
prints the country code for the IPs, which provide a way to print country codes for the IPs in SiLK IPsets
or bags.
The abbreviations used by the country code utility are those used by the Root-Zone Whois Index (see for
example http://www.iana.org/cctld/cctld-whois.htm) or one of the following special codes:
-N/A (e.g. private and experimental reserved addresses)
a1
anonymous proxy
a2
satellite provider
o1
other
The SiLK tools look for the country code mapping file in a standard location as detailed in the FILES section
below. To provide an alternate location, specify that location in the SILK COUNTRY CODES environment
variable.
Creating the Prefix Map (pmap) file that maps an IP to its country code requires the GeoIP Country(R) or
free GeoLite database created by MaxMind, available from http://www.maxmind.com, as described in the
MAPPING FILE section below.
December 18, 2014
367
ccfilter(3)
The SiLK Reference Guide
OPTIONS
Country code support makes available two additional keys to the --fields switch in the rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1) tools:
scc,18
Print, sort, and/or count the flow records by the country code designation of the source IP address
dcc,19
As scc for the destination address
In rwfilter(1), the following switches are supported:
--scc=COUNTRY CODE LIST
Pass the record if the country code of its source IP address is in the specified COUNTRY CODE LIST.
--dcc=COUNTRY CODE LIST
As --scc for the destination IP address.
MAPPING FILE
To map from IP addresses to country codes you will need to create the country codes.pmap data file and
install it in the appropriate location (see the FILES section below), or specify the path to the file in the
SILK COUNTRY CODES environment variable.
The prefix map data file is based on the GeoIP Country(R) or free GeoLite database created by MaxMind
and available from http://www.maxmind.com/. We do not distribute the database nor the data file, but we
provide Perl scripts that will convert the GeoIP database to the format that ccfilter.so expects.
MaxMind distributes multiple versions of their GeoIP Country database; one is a free evaluation copy that
is 97% accurate. In addition, they sell versions with higher accuracy, and they offer various subscription
services.
The rwgeoip2ccmap(1) program converts the MaxMind GeoIP file to the form that the SiLK tools require.
ENVIRONMENT
SILK COUNTRY CODES
This environment variable allows the user to specify the country code mapping file that the SiLK tools
use. The value may be a complete path or a file relative to SILK PATH. If the variable is not specified,
the code looks for a file named country codes.pmap as specified in the FILES section below.
SILK PATH
This environment variable gives the root of the install tree. The SiLK applications check the directories $SILK PATH/share/silk and $SILK PATH/share for the country code mapping file, country codes.pmap.
368
December 18, 2014
The SiLK Reference Guide
ccfilter(3)
FILES
The tools will look for the data file that maps IPs to country codes in the following locations.
($SILK COUNTRY CODES is the value of the SILK COUNTRY CODES environment variable, if it is
set. $SILK PATH is value of the SILK PATH environment variable, if it is set. The use of /usr/local/
assumes the application is installed in the /usr/local/bin/ directory.)
$SILK_COUNTRY_CODES
$SILK_PATH/share/silk/country_codes.pmap
$SILK_PATH/share/country_codes.pmap
/usr/local/share/silk/country_codes.pmap
/usr/local/share/country_codes.pmap
SEE ALSO
rwcut(1), rwfilter(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), rwgeoip2ccmap(1), rwpmaplookup(1), silk(7)
December 18, 2014
369
flowrate(3)
The SiLK Reference Guide
flowrate
SiLK plug-in providing payload and rate filters and fields
SYNOPSIS
rwfilter --plugin=flowrate.so [--payload-bytes=INTEGER_RANGE]
[--payload-rate=DECIMAL_RANGE]
[--bytes-per-second=DECIMAL_RANGE]
[--packets-per-second=DECIMAL_RANGE] ...
rwcut --plugin=flowrate.so --fields=FIELDS ...
rwgroup --plugin=flowrate.so --fields=FIELDS ...
rwsort --plugin=flowrate.so --fields=FIELDS ...
rwstats --plugin=flowrate.so --fields=FIELDS --values=FIELDS ...
rwuniq --plugin=flowrate.so --fields=FIELDS --values=FIELDS ...
DESCRIPTION
When loaded into rwfilter(1), the flowrate plug-in provides switches that can partition flows based on
bytes of payload and/or on the rates of data transfer.
For rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1), the flowrate plug-in provides fields
that will print, sort flows by, and group flows by the bytes of payload, bytes-per-packet, bytes-per-second,
packets-per-second, and bytes of payload per second. The flowrate plug-in also provides aggregate value
fields in rwstats and rwuniq.
The payload byte count is determined by subtracting from the total byte count in the flow the bytes of
overhead used by the packet headers. The payload calculation assumes minimal packet headers---that is,
there are no options in the packets. For TCP, the switch assumes there are no TCP timestamps in the packets.
Thus, the calculated payload will be the maximum possible bytes of payload. If the packet-overhead is larger
than the reported number of bytes, the value is zero.
The various flow-rate quantities are determined by dividing the payload byte count, packet count, or byte
count by the duration of the flow, giving the average rate across the flow. When the flow’s reported duration
is zero, a duration of one second is assumed (that is, the count is used directly).
The flowrate plug-in must be explicitly loaded into an application via the --plugin switch. The reason for
this is due to name clashes with existing switches and fields. For example, adding the --packets-per-second
switch to rwfilter means any short-cutting of the current --packets switch will fail.
OPTIONS
The flowrate plug-in provides the following options to the indicated applications.
370
December 18, 2014
The SiLK Reference Guide
flowrate(3)
rwfilter Switches
When the flowrate plug-in has been loaded, the following set of partitioning switches are added to rwfilter.
To pass the filter, the record must pass the test implied by each switch. The form of the argument to each
switch is described below. The partitioning switches are:
--payload-bytes=INTEGER RANGE
Check whether the payload byte count is within INTEGER RANGE.
--payload-rate=DECIMAL RANGE
Check whether the average number of payload bytes seen per second in the flow is within DECIMAL RANGE.
--packets-per-second=DECIMAL RANGE
Check whether the average number of packets per second in the flow is within DECIMAL RANGE.
--bytes-per-second=DECIMAL RANGE
Check whether the average number of bytes per second in the flow is within DECIMAL RANGE.
An INTEGER RANGE is a range of two non-negative integers, and a DECIMAL RANGE is a range of two
non-negative decimal values with accuracy up to 0.0001. The ranges are specified as two values separated by
a hyphen, MIN -MAX ; for example 1-500 or 5.0-10.031. If a single value is given (e.g., 3.14), the range
consists of that single value. The upper limit of the range may be omitted, such as 1-, in which case the
upper limit is set to the maximum possible value.
rwcut, rwgroup, rwsort, rwstats, and rwuniq Switches
--fields=FIELDS
FIELDS refers to a list of fields to use for the operation. The flowrate plug-in adds the following
fields for display, sorting, and grouping using the rwcut(1), rwgroup(1), rwsort(1), rwstats(1),
and rwuniq(1) tools:
payload-bytes
Print, sort by, or group by the number of bytes of payload.
payload-rate
Print, sort by, or group by the bytes of payload seen per second.
pckts/sec
Print, sort by, or group by the packets seen per second.
bytes/sec
Print, sort by, or group by the bytes seen per second.
bytes/packet
Print, sort by, or group by the average number of bytes contained in each packet.
--values=AGGREGATES
The flowrate plug-in adds the following aggregate value fields to rwstats and rwuniq. AGGREGATES refers to a list of values to compute for each bin. To compute these values, flowrate maintains
separate sums for the numerator and denominator while reading the records, then flowrate computes
the ratio when the output is generated.
December 18, 2014
371
flowrate(3)
The SiLK Reference Guide
payload-bytes
Compute the approximate bytes of payload for records in this bin.
payload-rate
Compute the average bytes of payload seen per second for records in this bin.
pckts/sec
Compute the average packets seen per second for records in this bin,
bytes/sec
Compute the average bytes seen per second for records in this bin.
bytes/packet
Compute the average number of bytes contained in each packet for records in this bin.
ENVIRONMENT
SILK PATH
This environment variable gives the root of the install tree. When searching for plug-ins, a SiLK
application may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, the SiLK applications print status messages to the standard error as they attempt to
find and open the flowrate.so plug-in. A typical invocation using this variable is:
env SILK_PLUGIN_DEBUG=1 rwcut --plugin=flowrate.so --version
FILES
${SILK PATH}/lib64/silk/flowrate.so
${SILK PATH}/lib64/flowrate.so
${SILK PATH}/lib/silk/flowrate.so
${SILK PATH}/lib/flowrate.so
/usr/local/lib64/silk/flowrate.so
/usr/local/lib64/flowrate.so
/usr/local/lib/silk/flowrate.so
/usr/local/lib/flowrate.so
Possible locations for the plug-in.
SEE ALSO
rwcut(1), rwfilter(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), silk(7)
372
December 18, 2014
The SiLK Reference Guide
int-ext-fields(3)
int-ext-fields
SiLK plug-in providing internal/external ip/port fields
SYNOPSIS
rwcut --plugin=int-ext-fields.so --fields=FIELDS ...
rwgroup --plugin=int-ext-fields.so --fields=FIELDS ...
rwsort --plugin=int-ext-fields.so --fields=FIELDS ...
rwstats --plugin=int-ext-fields.so --fields=FIELDS ...
rwuniq --plugin=int-ext-fields.so --fields=FIELDS ...
DESCRIPTION
The int-ext-fields plug-in adds four potential fields to rwcut(1), rwgroup(1), rwsort(1), rwstats(1),
and rwuniq(1). These fields contain the internal IP (int-ip), the external IP (ext-ip), the internal port
(int-port, and the external port (ext-port). To use these fields, specify their names in the --fields switch.
These fields can be useful when a file contains flow records that were collected for multiple directions---for
example, some flow records are incoming and some are outgoing.
For these fields to be available, the user must specify the list of flowtypes (i.e., class/type pairs) that are
considered incoming and the list that are considered outgoing. The user must specify the flowtypes because
SiLK has no innate sense of the direction of a flow record. Although ”in” and ”out” are common types,
SiLK does not recognize that these represent flows going in opposite directions.
If a record has a flowtype that is not in the list of incoming and output flowtypes, the application uses a
value of 0 for that field.
The user specifies the flowtypes by giving a comma-separated list of class/type pairs using the --incomingflowtypes and --outgoing-flowtypes switches on the application’s command line. When the switch is
not provided, the application checks the INCOMING FLOWTYPES and OUTGOING FLOWTYPES environment variables. If the list of incoming and/or outgoing flowtypes are not specified, the fields are not
available.
For the packlogic-twoway(3) site, one would set the following environment variables:
INCOMING_FLOWTYPES=all/in,all/inweb,all/inicmp,all/innull
OUTGOING_FLOWTYPES=all/out,all/outweb,all/outicmp,all/outnull
The parsing of flowtypes requires the silk.conf(5) site configuration file. You may need to set the
SILK CONFIG FILE environment variable or specify --site-config-file on the command line prior to loading the plug-in.
December 18, 2014
373
int-ext-fields(3)
The SiLK Reference Guide
OPTIONS
The int-ext-fields plug-in provides the following options to rwcut, rwgroup, rwsort, rwstats, and rwuniq.
--fields=FIELDS
FIELDS refers to a list of fields to use for the operation. The int-ext-fields plug-in adds the following
fields for display, sorting, and grouping using the rwcut(1), rwgroup(1), rwsort(1), rwstats(1),
and rwuniq(1) tools:
int-ip
Print, sort by, or group by the internal IP address. The internal IP is the destination address for
incoming flowtypes and the source address for outgoing flowtypes. When a SiLK Flow record’s
flowtype is not listed in either the incoming or outgoing flowtypes list, the int-ip field is 0.
ext-ip
Print, sort by, or group by the external IP address. The external IP is the source address for
incoming flowtypes and the destination address for outgoing flowtypes. When a SiLK Flow
record’s flowtype is not listed in either the incoming or outgoing flowtypes list, the ext-ip field is
0.
int-port
Print, sort by, or group by the internal port. This value is 0 for ICMP flow records, and when
the SiLK Flow record’s flowtype is not listed in either the incoming or outgoing flowtypes list.
ext-port
Print, sort by, or group by the external port. This value is 0 for ICMP flow records, and when
the SiLK Flow record’s flowtype is not listed in either the incoming or outgoing flowtypes list.
--incoming-flowtypes=CLASS /TYPE [,CLASS /TYPE ...]
Names the flowtypes that should be considered incoming. The list of flowtypes should be specified
as a comma-separated list of class/type pairs. This switch overrides the flowtype list specified in the
INCOMING FLOWTYPES environment variable. If this switch is not provided and the INCOMING FLOWTYPES environment variable is not set, the int-ext-fields plug-in will not define any
fields.
--outgoing-flowtypes=CLASS /TYPE [,CLASS /TYPE ...]
Similar to --incoming-flowtypes, except it names the flowtypes that should be considered outgoing,
and it overrides the OUTGOING FLOWTYPES environment variable.
EXAMPLE
In the following example, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
Consider the file data.rw that contains data going in different directions:
$ rwcut --fields=sip,sport,dip,dport,proto,class,type data.rw
sIP|sPort|
dIP|dPort|pro|cla|
type|
10.239.86.13|29897|192.168.228.153|
25| 6|all|
in|
192.168.228.153|
25|
10.239.86.13|29897| 6|all|
out|
374
December 18, 2014
The SiLK Reference Guide
192.168.208.237|29416| 10.233.108.250|
25|
10.233.108.250|
25|192.168.208.237|29416|
192.168.255.94|29301| 10.198.18.193|
80|
10.198.18.193|
80| 192.168.255.94|29301|
10.202.7.122|29438|192.168.248.202|
25|
192.168.248.202|
25|
10.202.7.122|29438|
10.255.142.104|26731|192.168.236.220|
25|
192.168.236.220|
25| 10.255.142.104|26731|
int-ext-fields(3)
6|all|
out|
6|all|
in|
6|all| outweb|
6|all| inweb|
6|all|
in|
6|all|
out|
6|all|
in|
6|all|
out|
Using the int-ext-fields plug-in allows one to print the internal and external addresses and ports (note:
command line wrapped for improved readability):
$ rwcut --plugin=int-ext-fields.so
--incoming=all/in,all/inweb --outgoing=all/out,all/outweb
--fields=ext-ip,ext-port,int-ip,int-port,proto,class,type
ext-ip|ext-p|
int-ip|int-p|pro|cla|
type|
10.239.86.13|29897|192.168.228.153|
25| 6|all|
in|
10.239.86.13|29897|192.168.228.153|
25| 6|all|
out|
10.233.108.250|
25|192.168.208.237|29416| 6|all|
out|
10.233.108.250|
25|192.168.208.237|29416| 6|all|
in|
10.198.18.193|
80| 192.168.255.94|29301| 6|all| outweb|
10.198.18.193|
80| 192.168.255.94|29301| 6|all| inweb|
10.202.7.122|29438|192.168.248.202|
25| 6|all|
in|
10.202.7.122|29438|192.168.248.202|
25| 6|all|
out|
10.255.142.104|26731|192.168.236.220|
25| 6|all|
in|
10.255.142.104|26731|192.168.236.220|
25| 6|all|
out|
\
\
This can be especially useful when using a tool like rwuniq or rwstats:
$ export INCOMING_FLOWTYPES=all/in,all/inweb
$ export OUTGOING_FLOWTYPES=all/out,all/outweb
$ rwuniq --plugin=int-ext-fields.so
--fields=int-ip,int-port --value=bytes
int-ip|int-p|
Bytes|
192.168.208.237|29416|
28517|
192.168.248.202|
25|
4016|
192.168.228.153|
25|
3454|
192.168.236.220|
25|
31872|
192.168.255.94|29301|
14147|
\
ENVIRONMENT
INCOMING FLOWTYPES
Used as the value for the --incoming-flowtypes when that switch is not provided.
OUTGOING FLOWTYPES
Used as the value for the --outgoing-flowtypes when that switch is not provided.
SILK CONFIG FILE
December 18, 2014
375
int-ext-fields(3)
The SiLK Reference Guide
This environment variable is used when the SiLK application attempts to locate the the SiLK site
configuration file unless the --site-config-file switch is specified. Additional locations where the
application searches are listed in the FILES section. The site configuration file is required to parse the
flowtypes.
SILK DATA ROOTDIR
This environment variable specifies the root directory of data repository. As described in the FILES
section, an application may use this environment variable when searching for the SiLK site configuration
file.
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, an application may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, the SiLK applications print status messages to the standard error as they attempt to
find and open the int-ext-fields.so plug-in. A typical invocation using this variable is
env SILK_PLUGIN_DEBUG=1 rwcut --plugin=int-ext-fields.so --version
FILES
${SILK CONFIG FILE}
${SILK DATA ROOTDIR}/silk.conf
/data/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when the --site-config-file
switch is not provided.
${SILK PATH}/lib64/silk/int-ext-fields.so
${SILK PATH}/lib64/int-ext-fields.so
${SILK PATH}/lib/silk/int-ext-fields.so
${SILK PATH}/lib/int-ext-fields.so
/usr/local/lib64/silk/int-ext-fields.so
/usr/local/lib64/int-ext-fields.so
/usr/local/lib/silk/int-ext-fields.so
/usr/local/lib/int-ext-fields.so
Possible locations for the plug-in.
SEE ALSO
rwcut(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), silk.conf(5), silk(7)
376
December 18, 2014
The SiLK Reference Guide
ipafilter(3)
ipafilter
SiLK plug-in for flow filtering based on IPA data
SYNOPSIS
rwfilter [--ipa-src-expr IPA_EXPR] [--ipa-dst-expr IPA_EXPR]
[--ipa-any-expr IPA_EXPR] ...
DESCRIPTION
The ipafilter plug-in provides switches to rwfilter(1) that can partition flows using data in an IPA database.
rwfilter will automatically load the ipafilter plug-in when it is available.
OPTIONS
The ipafilter plug-in provides the following options to rwfilter.
--ipa-src-expr=IPA EXPR
Use IPA EXPR to filter flows based on the source IP of the flow matching the IPA EXPR expression.
--ipa-dst-expr=IPA EXPR
Use IPA EXPR to filter flows based on the destination IP of the flow matching the IPA EXPR expression.
--ipa-any-expr=IPA EXPR
Use IPA EXPR to filter flows based on either the source or destination IP of the flow matching the
IPA EXPR expression.
IPA EXPRESSIONS
The syntax for IPA filter expressions is documented in ipaquery(1). Some simple examples are shown in
the EXAMPLES section below.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
To pull flows from or to any IP address in the ”watch” catalog:
$ rwfilter --start-date 2010/01/01:00
--ipa-any-expr "in watch at 2010/01/01"
--pass watchflows.rw
\
\
To pull flows from any IP labeled ”bad” in the last year:
December 18, 2014
377
ipafilter(3)
$ rwfilter --start-date 2010/01/01:00
--ipa-src-expr "label bad after 2009/01/01"
--pass badguys.rw
The SiLK Reference Guide
\
\
ENVIRONMENT
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files and
plug-ins, rwfilter may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, rwfilter prints status messages to the standard error as it attempts to find and open
the ipafilter.so plug-in. A typical invocation using this variable is
env SILK_PLUGIN_DEBUG=1 rwfilter --plugin=ipafilter.so --version
FILES
$SILK PATH/share/silk/silk-ipa.conf
$SILK PATH/share/silk-ipa.conf
/usr/local/share/silk/silk-ipa.conf
/usr/local/share/silk-ipa.conf
Possible locations for the IPA configuration file. This file contains the URI for connecting to the IPA
database. If the configuration file does not exist, attempts to use the ipafilter plug-in will exit with
an error. The format of this URI is driver ://user :pass-word @hostname/database. For example:
postgresql://ipauser:[email protected]/ipa
${SILK PATH}/lib64/silk/ipafilter.so
${SILK PATH}/lib64/ipafilter.so
${SILK PATH}/lib/silk/ipafilter.so
${SILK PATH}/lib/ipafilter.so
/usr/local/lib64/silk/ipafilter.so
/usr/local/lib64/ipafilter.so
/usr/local/lib/silk/ipafilter.so
/usr/local/lib/ipafilter.so
Possible locations for the plug-in.
SEE ALSO
rwfilter(1), rwipaimport(1), rwipaexport(1), silk(7), ipaquery(1), ipaimport(1), ipaexport(1)
378
December 18, 2014
The SiLK Reference Guide
packlogic-generic.so(3)
packlogic-generic.so
Packing logic for the generic site
SYNOPSIS
rwflowpack --packing-logic=packlogic-generic.so ...
DESCRIPTION
This manual page describes the packlogic-generic.so plug-in that defines the packing logic that rwflowpack(8) may use to categorize flow records. (This document uses the term plug-in, but the builder of SiLK
may choose to compile the packing logic into rwflowpack. See the SiLK Installation Handbook for details.)
The primary job of rwflowpack is to categorize flow records into one or more class and type pairs. The
class and type pair (also called a flowtype) are used by the analyst when selecting flow records from the data
store using rwfilter(1).
The settings that rwflowpack uses to categorize each flow record are determined by two textual configuration
files and a compiled plug-in that is referred to as the packing logic.
The first of the configuration files is silk.conf(5) which specifies the classes, types, and sensors that rwflowpack uses when writing files and that rwfilter uses when selecting flow files.
The second configuration file is the sensor.conf(5) file. This file contains multiple sensor blocks, where
each block contains information which the packing logic uses to categorize flow records collected by the
probes specified for that sensor.
The combination of a silk.conf file and a particular packing logic plug-in define a site. By having the
configuration and packing logic outside of the core tools, users can more easily configure SiLK for their
particular installation and a single installation of SiLK can support multiple sites.
This manual page describes the packing logic for the generic site. For a description of the packing logic at
another site, see that site’s manual page.
• packlogic-twoway(3)
Networks, Classes, and Types
The packlogic-generic.so plug-in uses three network names to describe the logical address spaces that border
the sensor:
internal
the space that is being monitored
external
the space outside the monitored network
null
the destination network for a flow that does not leave the router, because either the flow was blocked
by the router’s access control list or its destination was the router itself---e.g., a BGP message
December 18, 2014
379
packlogic-generic.so(3)
The SiLK Reference Guide
The generic site assumes that all packets are either blocked by the sensor (that is, their destination is the
null network), or that the packets cross the sensor so the source and destination networks always differ.
The packing logic also assumes that the above networks completely describe the space around the sensor.
Since the null network is strictly a destination network, any flow that does not originate from the external
network must originate from the internal network.
This allows the generic site to categorizes a flow record primarily by comparing a flow record’s source to
the external network, and the packing logic contains no comparisons to the internal network
The silk.conf file and packlogic-generic.so plug-in define a single class, all.
The type assigned to a flow record within the all class is one of:
in, inweb
Records whose source is the external network and whose destination is not the null network represent
incoming traffic. The traffic is split into multiple types, and these types allow the analysts to query a
subset of the flow records depending on their needs. Each incoming flow record is split into the one of
incoming types using the following rules:
inweb
Contains traffic where the protocol is TCP (6) and either the source port or the destination port
is one of 80, 443, or 8080
in
Contains all other incoming traffic.
out, outweb
Records whose source is not the external network and whose destination is not the null network
represent outgoing traffic. The traffic is split among the types using rules similar to those for incoming
traffic.
innull
Records whose source is the external network and whose destination is the null network represent
blocked incoming traffic.
outnull
Records whose source is not the external network and whose destination is the null network represent
blocked outgoing traffic.
Assigning a flow to source and destination networks
Since the generic site uses the external network to determine a flow record’s type, each sensor block in
the sensor.conf(5) file must specify a definition for the external network.
The sensor.conf file provides two ways to define a network: use the NET -ipblocks statement to specify
the NET network as a list of IP address blocks, or use the NET -interfaces statement to specify the NET
network using a list of SNMP interfaces.
For the source network of a flow record to be considered external, either the source IP (SiLK field sIP)
must appear in the list of external-ipblocks or the incoming SNMP interface (SiLK field in) must appear
in the list of external-interfaces. Note: If the probe block that specifies where the flow was collected
contains an interface-values vlan statement, the SiLK in field contains the VLAN ID.
380
December 18, 2014
The SiLK Reference Guide
packlogic-generic.so(3)
For the destination network of a flow record to be considered null, either the destination IP (dIP) must
appear in the list of null-ipblocks or the outgoing SNMP interface (out) must appear in the list of nullinterfaces.
Consider the following two sensors:
sensor S2
ipfix-probes S2
external-ipblocks 172.16.0.0/16
internal-ipblocks 172.20.0.0/16
end sensor
sensor S3
ipfix-probes S3
external-interfaces 17,18,19
internal-interfaces 21,22,23
end sensor
A flow record collected at probe S2 whose sIP is 172.16.1.1 is considered incoming, regardless of the destination IP.
A flow record collected at probe S3 whose in is 27 is considered outgoing. (Since in does not match
the external-interfaces, the record is considered outgoing even though in does not match the internalinterfaces either.)
There are two constructs in the sensor.conf file that help when specifying these lists:
1. The NET -interfaces or NET -ipblocks statement in a sensor block may use remainder to denote
interfaces or IP blocks that do not appear elsewhere in the block.
2. A group block can be used to give a name to a set of IP blocks or SNMP interfaces which a sensor
block can reference.
For details, see the sensor.conf(5) manual page.
Valid sensors
When using the packlogic-generic.so plug-in, the sensor blocks in the sensor.conf file supports the following
types of probes:
• ipfix
• netflow-v5
• netflow-v9
In addition, each sensor block must meet the following rules:
• Either external-interfaces or external-ipblocks must be specified. And,
• A sensor cannot mix NET -ipblocks and NET -interfaces, with the exception that null-interfaces
are always allowed. And,
December 18, 2014
381
packlogic-generic.so(3)
The SiLK Reference Guide
• Only one network on the sensor may use remainder. And,
• If a sensor contains only one NET -ipblocks statement, that statement may not use remainder.
(The NET -interfaces statement does not have this restriction.)
Packing logic code
This section provides the logic used to assign the class and type at the generic site.
A single sensor block will assign the flow record to a single class and type, and processing of the flow for
that sensor block stops as soon as a type is assigned. When multiple sensor blocks reference the same
probe, the flow records collected by that probe are processed by each of those sensor blocks.
A flow record is always assigned to the class all.
A textual description of the code used to assign the type is shown here. As of SiLK 3.8.0, the type may be
determined by the presence of certain IPFIX or NetFlowV9 information elements.
• If sIP matches external-ipblocks or in matches external-interfaces, then
– If dIP matches null-ipblocks or out matches null-interfaces, pack as innull. Else,
– Pack as in or inweb.
• If dIP matches null-ipblocks or out matches null-interfaces, pack as outnull. Else,
• Pack as out or outweb.
• Potentially modify the type: If the probe has a quirks setting that includes firewall-event and if
the incoming record contains the firewallEvent or NF F FW EVENT information element whose value
is 3 (flow denied), change the type where the flow is packed as follows:
– If the flow was denied due to an ingress ACL (NF F FW EXT EVENT of 1001), pack as innull.
– If the flow was denied due to an egress ACL (NF F FW EXT EVENT of 1002), pack as outnull.
– If the flow’s current type is innull, in, or inweb, pack as innull.
– If the flow’s current type is outnull, out, or outweb, pack as outnull.
SEE ALSO
rwfilter(1), rwflowpack(8), sensor.conf(5), silk.conf(5), packlogic-twoway(3), silk(7), SiLK Installation Handbook
382
December 18, 2014
The SiLK Reference Guide
packlogic-twoway.so(3)
packlogic-twoway.so
Packing logic for the twoway site
SYNOPSIS
rwflowpack --packing-logic=packlogic-twoway.so ...
DESCRIPTION
This manual page describes the packlogic-twoway.so plug-in that defines the packing logic that rwflowpack(8) may use to categorize flow records. (This document uses the term plug-in, but the builder of SiLK
may choose to compile the packing logic into rwflowpack. See the SiLK Installation Handbook for details.)
The primary job of rwflowpack is to categorize flow records into one or more class and type pairs. The
class and type pair (also called a flowtype) are used by the analyst when selecting flow records from the data
store using rwfilter(1).
The settings that rwflowpack uses to categorize each flow record are determined by two textual configuration
files and a compiled plug-in that is referred to as the packing logic.
The first of the configuration files is silk.conf(5) which specifies the classes, types, and sensors that rwflowpack uses when writing files and that rwfilter uses when selecting flow files.
The second configuration file is the sensor.conf(5) file. This file contains multiple sensor blocks, where
each block contains information which the packing logic uses to categorize flow records collected by the
probes specified for that sensor.
The combination of a silk.conf file and a particular packing logic plug-in define a site. By having the
configuration and packing logic outside of the core tools, users can more easily configure SiLK for their
particular installation and a single installation of SiLK can support multiple sites.
This manual page describes the packing logic for the twoway site. For a description of the packing logic at
another site, see that site’s manual page.
• packlogic-generic(3)
Networks, Classes, and Types
The silk.conf file and packlogic-twoway.so plug-in categorize a flow record based on how the packets that
comprise the flow record moved between different networks.
The packlogic-twoway.so plug-in specifies three network names to describe the logical address spaces that
border the sensor:
internal
the space that is being monitored
external
the space outside the monitored network
December 18, 2014
383
packlogic-twoway.so(3)
The SiLK Reference Guide
null
the destination network for a flow that does not leave the router, because either the flow was blocked
by the router’s access control list or its destination was the router itself---e.g., a BGP message
There is an implicit fourth network, unknown, which is anything that does not match the three networks
above.
Given these networks, the following table describes how flows can move between the networks. Traffic
between the networks is successfully routed unless the description explicitly says ”blocked”.
SOURCE
external
internal
external
internal
external
internal
null
external
internal
unknown
DESTINATION
internal
external
null
null
external
internal
any
unknown
unknown
any
DESCRIPTION
incoming traffic
outgoing traffic
blocked incoming traffic
blocked outgoing traffic
strictly external traffic
strictly internal traffic
unclear: null should never be a source
unclear
unclear
unclear
The silk.conf file and packlogic-twoway.so plug-in define a single class, all.
The type assigned to a flow record within the all class depends on the how the record moves between the
networks, and the types follow from the table above:
in, inicmp, inweb
Incoming traffic. The traffic is split into multiple types, and these types allow the analysts to query a
subset of the flow records depending on their needs. Each incoming flow record is split into the one of
incoming types using the following rules:
inweb
Contains traffic where the protocol is TCP (6) and either the source port or the destination port
is one of 80, 443, or 8080
inicmp
Contains flow records where either the protocol is ICMP (1) or the flow record is IPv6 and the
protocol is ICMPV6 (58). By default, the inicmp and outicmp types are not used by the
packlogic-twoway.so plug-in.
in
Contains all other incoming traffic.
out, outicmp, outweb
Outgoing traffic. The traffic is split among the types using rules similar to those for incoming traffic.
innull
Blocked incoming traffic
outnull
Blocked outgoing traffic
384
December 18, 2014
The SiLK Reference Guide
packlogic-twoway.so(3)
ext2ext
Strictly external traffic
int2int
Strictly internal traffic
other
Either traffic from the null network or traffic to or from the unknown network
Assigning a flow to source and destination networks
Each sensor block in the sensor.conf(5) file must specify how to determine the source and destination
networks for each flow record collected by the probes specified for that sensor. There are two ways to do
this.
The first method sets the source and destination of all records to particular networks. This can be used, for
example, when the physical network device at the sensor only sees one direction of the traffic. To do this,
use the source-network and destination-network statements in the sensor block. The following sensor,
S1, considers all traffic as blocked incoming:
sensor S1
ipfix-probes S1
source-network external
destination-network null
end sensor
The second method to determine how a flow record moves between the networks is to define the networks
and use characteristics of the flow record to determine its source and destination networks.
The sensor.conf file provides two ways to define a network: use the NET -ipblocks statement to specify
the NET network as a list of IP address blocks, or use the NET -interfaces statement to specify the NET
network using a list of SNMP interfaces.
For the source network of a flow record to be considered external, either the source IP (SiLK field sIP)
must appear in the list of external-ipblocks or the incoming SNMP interface (SiLK field in) must appear
in the list of external-interfaces. Note: If the probe block that specifies where the flow was collected
contains an interface-values vlan statement, the SiLK in field contains the VLAN ID.
For the destination network of a flow record to be considered null, either the destination IP (dIP) must
appear in the list of null-ipblocks or the outgoing SNMP interface (out) must appear in the list of nullinterfaces.
Consider the following two sensors:
sensor S2
ipfix-probes S2
external-ipblocks 172.16.0.0/16
internal-ipblocks 172.20.0.0/16
end sensor
sensor S3
ipfix-probes S3
December 18, 2014
385
packlogic-twoway.so(3)
The SiLK Reference Guide
external-interfaces 17,18,19
internal-interfaces 21,22,23
end sensor
A flow record collected at probe S2 whose sIP is 172.16.1.1 and whose dIP is 172.20.2.2 is considered
incoming.
A flow record collected at probe S3 whose in is 23 and whose out is 18 is considered outgoing. A flow on
S3 whose in is 23 and whose out is 27 is written to other since the out field is not matched.
There are two constructs in the sensor.conf file that help when specifying these lists:
1. The NET -interfaces or NET -ipblocks statement in a sensor block may use remainder to denote
interfaces or IP blocks that do not appear elsewhere in the block.
2. A group block can be used to give a name to a set of IP blocks or SNMP interfaces which a sensor
block can reference.
For details, see the sensor.conf(5) manual page.
Valid sensors
When using the packlogic-twoway.so plug-in, the sensor blocks in the sensor.conf file supports the following
types of probes:
• ipfix
• netflow-v5
• netflow-v9
• sflow
• silk
In addition, each sensor block must meet the following rules:
• If the sensor has the source-network and destination-network explicitly set, the sensor is valid
and none of the following checks are performed. Otherwise,
• At least one of NET -interfaces or NET -ipblocks must be specified, where NET is either internal
or external. And,
• A sensor cannot mix NET -ipblocks and NET -interfaces, with the exception that null-interfaces
are always allowed. And,
• Only one network on the sensor may use remainder. And,
• If a sensor contains only one NET -ipblocks statement, that statement may not use remainder.
(The NET -interfaces statement does not have this restriction.) And,
• When the remainder keyword is not used and only one of the internal or external networks is
defined, the external or internal network, respectively, is defined as having the remainder.
386
December 18, 2014
The SiLK Reference Guide
packlogic-twoway.so(3)
Packing logic code
This section provides the logic used to assign the class and type at the twoway site.
A single sensor block will assign the flow record to a single class and type, and processing of the flow for
that sensor block stops as soon as a type is assigned. When multiple sensor blocks reference the same
probe, the flow records collected by that probe are processed by each of those sensor blocks.
A flow record is always assigned to the class all unless the flow is ignored.
A textual description of the code used to assign the type is shown here. As of SiLK 3.8.0, the type may be
determined by the presence of certain IPFIX or NetFlowV9 information elements.
• Ignore any flow record that matches a discard-when statement or does not match a discard-unless
statement.
• If source-network is external, if sIP matches external-ipblocks, or if in matches externalinterfaces, then
– If destination-network is null, if dIP matches null-ipblocks, or if out matches nullinterfaces, pack as innull. Else,
– If destination-network is internal, if dIP matches internal-ipblocks, or if out matches
internal-interfaces, pack as in, inicmp, or inweb. Else,
– If destination-network is external, if dIP matches external-ipblocks, or if out matches
external-interfaces, pack as ext2ext. Else,
– Pack as other.
• Else, if source-network is internal, if sIP matches internal-ipblocks, or if in matches internalinterfaces, then
– If destination-network is null, if dIP matches null-ipblocks, or if out matches nullinterfaces, pack as outnull. Else,
– If destination-network is external, if dIP matches external-ipblocks, or if out matches
external-interfaces, pack as out, outicmp, or outweb. Else,
– If destination-network is internal, if dIP matches internal-ipblocks, or if out matches
internal-interfaces, pack as int2int. Else,
– Pack as other.
• Else, pack as other.
• Potentially modify the type: If the probe has a quirks setting that includes firewall-event and if
the incoming record contains the firewallEvent or NF F FW EVENT information element whose value
is 3 (flow denied), change the type where the flow is packed as follows:
–
–
–
–
–
If the flow was denied due to an ingress ACL (NF F FW EXT EVENT of 1001), pack as innull.
If the flow was denied due to an egress ACL (NF F FW EXT EVENT of 1002), pack as outnull.
If the flow’s current type is in, inweb, inicmp, or ext2ext, pack as innull.
If the flow’s current type is out, outweb, outicmp, or int2int, pack as outnull.
Else leave the type as is (innull, outnull, or other).
SEE ALSO
rwfilter(1), rwflowpack(8), sensor.conf(5), silk.conf(5), packlogic-generic(3), silk(7), SiLK Installation Handbook
December 18, 2014
387
pmapfilter(3)
The SiLK Reference Guide
pmapfilter
User-defined labels for IPs and protocol/port pairs
SYNOPSIS
rwfilter --pmap-file=[MAPNAME:]FILENAME
[--pmap-file=[MAPNAME:]FILENAME ...]
[--pmap-src-MAPNAME=LABELS] [--pmap-dst-MAPNAME=LABELS]
[--pmap-any-MAPNAME=LABELS] ...
rwcut --pmap-file=[MAPNAME:]FILENAME
[--pmap-file=[MAPNAME:]FILENAME ...]
--fields=FIELDS [--pmap-column-width=NUM]
rwgroup --pmap-file=[MAPNAME:]FILENAME
[--pmap-file=[MAPNAME:]FILENAME ...]
--id-fields=FIELDS
rwsort --pmap-file=[MAPNAME:]FILENAME
[--pmap-file=[MAPNAME:]FILENAME ...]
--fields=FIELDS
rwstats --pmap-file=[MAPNAME:]FILENAME
[--pmap-file=[MAPNAME:]FILENAME ...]
--fields=FIELDS [--pmap-column-width=NUM]
rwuniq --pmap-file=[MAPNAME:]FILENAME
[--pmap-file=[MAPNAME:]FILENAME ...]
--fields=FIELDS [--pmap-column-width=NUM]
DESCRIPTION
Prefix maps provide a mapping from values on a SiLK Flow record to string labels. The binary prefix
map file is created from textual input with rwpmapbuild. See the rwpmapbuild(1) manual page for the
syntax of input file. This manual page describes how to use a prefix map file to augment the features of
some commonly used SiLK applications.
A prefix map file maps either an IP address or a protocol/port pair to a label. The mode statement in the
input to rwpmapbuild determines whether the prefix map file is a mapping for IPs or for protocol/port
pairs. To see the mode of an existing prefix map, use rwpmapcat(1) and specify --output-type=type.
When using the prefix map file as described in this manual page, one typically uses the prefix map’s mapname. The map-name statement in the input to rwpmapbuild allows one to assign the map-name when
creating the prefix map. To see the map-name of an existing prefix map, use rwpmapcat --outputtype=mapname. To assign a map-name when loading a prefix map file, use the --pmap-file switch and
specify the map-name you want to use, a colon, and the file name. A map-name provided to the --pmap-file
switch overrides the map-name in the file (if one exists).
When using a prefix map in rwfilter(1), the map-name is combined with the prefix --pmap-src-, --pmapdst-, or --pmap-any- to create the partitioning switches. When using the prefix map to create fields in
388
December 18, 2014
The SiLK Reference Guide
pmapfilter(3)
rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1), the map-name must be combined with
the prefix src- or dst- to create the field names.
The applications support using multiple prefix map files in a single invocation. When using multiple prefix
map files, each file must have a unique map-name (or be assigned a unique map-name on the command line).
When a prefix map file does not contain a map-name and no map-name is provided on the command line,
SiLK processes the prefix map in legacy mode. When in legacy mode, only one prefix map file may be used.
See the LEGACY section for details.
Three types of prefix map files are currently implemented:
proto-port
Maps a protocol/port pair to a label.
IPv4-address
Maps an IPv4 address to a label. When used with IPv6 addresses, an IPv6 address in the ::ffff:0:0/96
prefix is converted to IPv4 and mapped to the label. Any other IPv6 address is mapped to the label
UNKNOWN.
IPv6-address
Maps an IPv6 address to a label. When used with an IPv4 address, the IPv4 address is converted to
IPv6, mapping the IPv4 address into the ::ffff:0:0/96 prefix.
For more information on constructing prefix map files, see the rwpmapbuild(1) documentation. To view
the contents, type, or map-name of a prefix map file, use rwpmapcat(1). To map textual input to the
labels in a prefix map, use rwpmaplookup(1).
OPTIONS
The --pmap-file switch is used to load the prefix map into the application. Use of the prefix map varies by
application.
To use a prefix map within a supported application, one or more --pmap-file switches are required. Multiple
--pmap-file switches are allowed as long as each prefix map is associated with a unique map-name. The
switch has two forms:
--pmap-file=MAPNAME :FILENAME
FILENAME refers to a prefix map file generated using rwpmapbuild. MAPNAME is a name that
will be used to refer to the fields or options specific to that prefix map.
--pmap-file=FILENAME
When a MAPNAME is not specified explicitly as part of the argument, the prefix map file is checked
to determine if a map-name was set when the prefix map was created (see rwpmapbuild). If so, that
map-name is used. If not, the prefix map is processed in legacy mode for backward compatibility. See
LEGACY below for more information.
rwfilter Switches
When using the prefix map in rwfilter(1), the map-name is combined with the prefix --pmap-src-, -pmap-dst-, or --pmap-any- to create the partitioning switches; that is, the switch name depends in part
on the map-name of the prefix map.
December 18, 2014
389
pmapfilter(3)
The SiLK Reference Guide
--pmap-src-MAPNAME =LABELS
If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source
address that maps to a label contained in the list of labels in LABELS.
If the prefix map associated with MAPNAME is a proto-port prefix map, this matches records with a
protocol and source port combination that maps to a label contained in the list of labels in LABELS.
--pmap-dst-MAPNAME =LABELS
Similar to --pmap-src-MAPNAME , but uses the destination IP or the protocol and destination
port.
--pmap-any-MAPNAME =LABELS
If the prefix map associated with MAPNAME is an IP prefix map, this matches records with a source
or destination address that maps to a label contained in the list of labels in LABELS.
If the prefix map associated with MAPNAME is a proto-port prefix map, this matches records with
a protocol and a source or destination port combination that maps to a label contained in the list of
labels in LABELS.
rwcut, rwgroup, rwsort, rwstats, and rwuniq Switches
When using the prefix map to create fields in rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1), the map-name must be combined with the prefix src- or dst- to create the field names. The field
names depend in part on the map-name of the prefix map.
--fields=FIELDS
FIELDS refers to a list of fields to use for the operation. Each prefix map associated with MAPNAME creates two additional fields, src-MAPNAME and dst-MAPNAME , available for display, sorting,
and counting using the rwcut, rwgroup, rwsort, rwstats, and rwuniq tools.
src-MAPNAME
The value for the source from the prefix map file associated with MAPNAME. For an IPbased prefix map file, this corresponds to the source IP. For a proto-port prefix map, it is the
protocol/source-port.
dst-MAPNAME
As src-MAPNAME for the destination IP address or protocol/destination-port. It is possible
to encode type and code in a proto-port prefix map, but it will only work when used for the
protocol/destination-port.
--pmap-column-width=NUM
Set the maximum number of characters to use when displaying the textual value of any prefix map
field in rwcut, rwstats, and rwuniq to NUM. This switch must precede the --fields switch. This
switch is useful for prefix map files that have very long dictionary values.
LEGACY
When a prefix map file does not contain a map-name and no map-name is specified in the --pmap-file
argument, SiLK processes the prefix map as it did prior to SiLK 2.0, which is called legacy mode. When in
legacy mode, only one prefix map file may be used by the application. Legacy mode is deprecated, but it is
maintained for backwards compatibility.
390
December 18, 2014
The SiLK Reference Guide
pmapfilter(3)
Legacy Switches
When a prefix map is loaded into rwfilter in legacy mode, the following switches are defined:
--pmap-saddress=LABELS
Match records with a source IP address that maps to a label contained in the list of labels in LABELS.
Only works with IP prefix maps.
--pmap-daddress=LABELS
As --pmap-saddress for the destination IP.
--pmap-any-address=LABELS
Match records with a source or destination IP address that maps to a label contained in the list of
labels in LABELS. Only works with IP prefix maps.
--pmap-sport-proto=LABELS
Match records with a protocol and source port combination that maps to a label contained in the list
of labels in LABELS. Only works with proto-port prefix maps.
--pmap-dport-proto=LABELS
As --pmap-saddress for the protocol and destination port.
--pmap-any-port-proto=LABELS
Match records with a protocol and a source or destination port combination that maps to a label
contained in the list of labels in LABELS. Only works with proto-port prefix maps.
Legacy Fields
When a prefix map is loaded into rwcut, rwgroup, rwsort, rwstats, or rwuniq in legacy mode, the
following fields are made available to the --fields switch:
sval
The value from the prefix map file for the source. For an IP-based prefix map file, this corresponds to
the source IP. For a proto-port prefix map, it is the protocol/source-port.
dval
As sval for the destination IP address or protocol/destination-port.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
The following examples explicitly specify the map name on the command line, ensuring the examples work
any prefix map file. The examples use two prefix map files:
carnegiemellon.pmap
Maps the internal IP space of Carnegie Mellon to labels specifying the department that has been
assigned that IP space. (An IPv4 prefix map provides a label for every IPv4 address; in this case, any
IP outside of Carnegie Mellon’s IP space is given the label external.)
December 18, 2014
391
pmapfilter(3)
The SiLK Reference Guide
service.pmap
Maps protocol/ports pairs to well-known services associated with those pairs (e.g., based the file
/etc/protocols and /etc/services). For example, 80/tcp is labeled TCP/HTTP, 25/tcp is TCP/SMTP,
ephemeral ports in protocol 6 are TCP, protocol 1 is ICMP, etc.
To find today’s incoming flow records going to ”FineArts”:
$ rwfilter --type=in,inweb --pmap-file=CMU:carnegiemellon.pmap
--pmap-dst-CMU="FineArts" --pass=fine-arts-in.rw
\
To find today’s outgoing flow records coming from ”ChemE”:
$ rwfilter --type=out,outweb --pmap-file=CMU:carnegiemellon.pmap
--pmap-src-CMU="ChemE" --pass=cheme-out.rw
\
To find today’s internal traffic from ”FineArts” to ”ChemE”:
$ rwfilter --type=int2int --pmap-file=CMU:carnegiemellon.pmap
--pmap-src-CMU="FineArts" --pmap-dst-CMU="ChemE"
--pass=finearts-to-cheme.rw
\
\
To find the reverse traffic:
$ rwfilter --type=int2int --pmap-file=CMU:carnegiemellon.pmap
--pmap-src-CMU="ChemE" --pmap-dst-CMU="FineArts"
--pass=cheme-to-finearts.rw
\
\
To find today’s internal traffic that started or ended at ”FineArts” and ”ChemE” (this will find traffic
between them, as well as traffic they had with any other university department):
$ rwfilter --type=int2int --pmap-file=CMU:carnegiemellon.pmap
--pmap-any-CMU="ChemE,FineArts"
--pass=cheme-finearts.rw
\
\
Using the service.pmap file with rwcut to print the label for the protocol/port pairs:
$ rwcut --pmap-file=service:service.pmap
--fields=protocol,dport,dst-service,sport,src-service
flow-records.rw
pro|dPort|dst-service|sPort|src-service|
17|
53|
UDP/DNS|29617|
UDP|
17|29617|
UDP|
53|
UDP/DNS|
6|
22|
TCP/SSH|29618|
TCP|
6|29618|
TCP|
22|
TCP/SSH|
1| 771|
ICMP|
0|
ICMP|
17|
67|
UDP/DHCP|
68|
UDP/DHCP|
6| 443| TCP/HTTPS|28816|
TCP|
6|29897|
TCP|
25|
TCP/SMTP|
6|29222|
TCP|
80|
TCP/HTTP|
17|29361|
UDP|
53|
UDP/DNS|
392
\
\
December 18, 2014
The SiLK Reference Guide
pmapfilter(3)
Using the service.pmap file with rwuniq:
$ rwuniq --pmap-file=serv:service.pmap --fields=dst-serv
--values=bytes flow-records.rw
dst-serv|
Bytes|
TCP/SSH|
3443906999|
TCP/SMTP|
780000305|
TCP|
114397570896|
TCP/HTTPS|
387741258|
TCP/HTTP|
1526975653|
UDP/NTP|
1176632|
UDP|
14404581|
UDP/DHCP|
5121392|
UDP/DNS|
3797474|
ICMP|
10695328|
\
Using the service.pmap file with rwstats:
$ rwstats --pmap-file=srvc:service.pmap --fields=dst-srvc \
--values=bytes --count=5 flow-records.rw
INPUT: 501876 Records for 10 Bins and 120571390518 Total Bytes
OUTPUT: Top 5 Bins by Bytes
dst-srvc|
Bytes|
%Bytes|
cumul_%|
TCP|
114397570896| 94.879532| 94.879532|
TCP/SSH|
3443906999| 2.856322| 97.735854|
TCP/HTTP|
1526975653| 1.266449| 99.002303|
TCP/SMTP|
780000305| 0.646920| 99.649223|
TCP/HTTPS|
387741258| 0.321586| 99.970809|
Using rwsort with two prefix maps, where the records are first sorted by the originating department and
then by the service they are requesting:
$ rwsort --pmap-file=service:service.pmap
--pmap-file=cmu:carnegiemellon.pmap
--fields=src-cmu,dst-service flow-records.rw
\
\
To see the partitioning switches that a prefix map adds to rwfilter, load the prefix map file prior to specifying
the --help switch.
$ rwfilter --pmap-file=carnegiemellon.pmap --help
| sed -n ’/^--pmap-/p’
\
To see the fields that a prefix map file adds to rwcut, rwgroup, rwsort, rwstats, or rwuniq, load the
prefix map file prior to specifying --help, and then view the description of the --fields switch.
$ rwsort --pmap-file=service.pmap --help
| sed -n ’/^--fields/,/^--/p’
\
SEE ALSO
rwcut(1), rwfilter(1), rwgroup(1), rwpmapbuild(1), rwpmapcat(1), rwpmaplookup(1), rwsort(1), rwstats(1), rwuniq(1), silk(7)
December 18, 2014
393
pysilk(3)
The SiLK Reference Guide
PySiLK
Silk in Python
DESCRIPTION
This document describes the features of PySiLK, the SiLK Python extension. It documents the objects
and methods that allow one to read, manipulate, and write SiLK Flow records, IPsets, Bags, and Prefix
Maps (pmaps) from within python(1). PySiLK may be used in a stand-alone Python script or as a plug-in
from within the SiLK tools rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1).
This document describes the objects and methods that PySiLK provides; the details of using those from
within a plug-in are documented in the silkpython(3) manual page.
The SiLK Python extension defines the following objects and modules:
IPAddr object
Represents an IP Address.
IPv4Addr object
Represents an IPv4 Address.
IPv6Addr object
Represents an IPv6 Address.
IPWildcard object
Represents CIDR blocks or SiLK IP wildcard addresses.
IPSet object
Represents a SiLK IPset.
PrefixMap object
Represents a SiLK Prefix Map.
Bag object
Represents a SiLK Bag.
TCPFlags object
Represents TCP flags.
RWRec object
Represents a SiLK Flow record.
SilkFile object
Represents a channel for writing to or reading from SiLK Flow files.
FGlob object
Allows retrieval of filenames in a SiLK data store. See also the silk.site module.
silk.site module
Defines several functions that relate to the SiLK site configuration and allow iteration over the files in
a SiLK data store.
394
December 18, 2014
The SiLK Reference Guide
pysilk(3)
silk.plugin module
Defines functions that may only be used in SiLK Python plug-ins.
The SiLK Python extension provides the following functions:
silk.get configuration(name=None)
When name is None, return a dictionary whose keys specify aspects of how SiLK was compiled. When
name is provided, return the dictionary value for that key, or None when name is an unknown key.
The dictionary’s keys and their meanings are:
COMPRESSION METHODS
A list of strings specifying the compression methods that were compiled into this build of SiLK.
The list will contain one or more of NO COMPRESSION, ZLIB, and/or LZO1X.
INITIAL TCPFLAGS ENABLED
True if SiLK was compiled with support for initial TCP flags; False otherwise.
IPV6 ENABLED
True if SiLK was compiled with IPv6 support; False otherwise.
SILK VERSION
The version of SiLK linked with PySiLK, as a string.
TIMEZONE SUPPORT
The string UTC if SiLK was compiled to use UTC, or the string local if SiLK was compiled to
use the local timezone.
Since SiLK 3.8.1.
silk.ipv6 enabled()
Return True if SiLK was compiled with IPv6 support, False otherwise.
silk.initial tcpflags enabled()
Return True if SiLK was compiled with support for initial TCP flags, False otherwise.
silk.init country codes(filename=None)
Initialize PySiLK’s country code database. filename should be the path to a country code prefix map, as
created by rwgeoip2ccmap(1). If filename is not supplied, SiLK will look first for the file specified by
$SILK COUNTRY CODES, and then for a file named country codes.pmap in $SILK PATH/share/silk,
$SILK PATH/share, /usr/local/share/silk, and /usr/local/share. (The latter two assume that SiLK
was installed in /usr/local.) Will throw a RuntimeError if loading the country code prefix map fails.
silk.silk version()
Return the version of SiLK linked with PySiLK, as a string.
IPAddr Object
An IPAddr object represents an IPv4 or IPv6 address. These two types of addresses are represented by
two subclasses of IPAddr: IPv4Addr and IPv6Addr.
December 18, 2014
395
pysilk(3)
The SiLK Reference Guide
class silk.IPAddr(address)
The constructor takes a string address, which must be a string representation of either an IPv4 or IPv6
address, or an IPAddr object. IPv6 addresses are only accepted if silk.ipv6 enabled() returns True.
The IPAddr object that the constructor returns will be either an IPv4Addr object or an IPv6Addr
object.
For compatibility with releases prior to SiLK 2.2.0, the IPAddr constructor will also accept an integer
address, in which case it converts that integer to an IPv4Addr object. This behavior is deprecated.
Use the IPv4Addr and IPv6Addr constructors instead.
Examples:
>>>
>>>
>>>
>>>
>>>
>>>
addr1
addr2
addr3
addr4
addr5
addr6
=
=
=
=
=
=
IPAddr(’192.160.1.1’)
IPAddr(’2001:db8::1428:57ab’)
IPAddr(’::ffff:12.34.56.78’)
IPAddr(addr1)
IPAddr(addr2)
IPAddr(0x10000000) # Deprecated as of SiLK 2.2.0
Supported operations and methods:
Inequality Operations
In all the below inequality operations, whenever an IPv4 address is compared to an IPv6 address, the
IPv4 address is converted to an IPv6 address before comparison. This means that IPAddr(”0.0.0.0”)
== IPAddr(”::ffff:0.0.0.0”).
addr1 == addr2
Return True if addr1 is equal to addr2 ; False otherwise.
addr1 != addr2
Return False if addr1 is equal to addr2 ; True otherwise.
addr1 < addr2
Return True if addr1 is less than addr2 ; False otherwise.
addr1 <= addr2
Return True if addr1 is less than or equal to addr2 ; False otherwise.
addr1 >= addr2
Return True if addr1 is greater than or equal to addr2 ; False otherwise.
addr1 > addr2
Return True if addr1 is greater than addr2 ; False otherwise.
addr.is ipv6()
Return True if addr is an IPv6 address, False otherwise.
addr.isipv6()
(DEPRECATED in SiLK 2.2.0) An alias for is ipv6().
addr.to ipv6()
If addr is an IPv6Addr, return a copy of addr. Otherwise, return a new IPv6Addr mapping addr
into the ::ffff:0:0/96 prefix.
396
December 18, 2014
The SiLK Reference Guide
pysilk(3)
addr.to ipv4()
If addr is an IPv4Addr, return a copy of addr. If addr is in the ::ffff:0:0/96 prefix, return a new
IPv4Addr containing the IPv4 address. Otherwise, return None.
int(addr )
Return the integer representation of addr. For an IPv4 address, this is a 32-bit number. For an IPv6
address, this is a 128-bit number.
str(addr )
Return a human-readable representation of addr in its canonical form.
addr.padded()
Return a human-readable representation of addr which is fully padded with zeroes. With IPv4, it
will return a string of the form ”xxx.xxx.xxx.xxx”. With IPv6, it will return a string of the form
”xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx”.
addr.mask(mask )
Return a copy of addr masked by the IPAddr mask.
When both addresses are either IPv4 or IPv6, applying the mask is straightforward.
If addr is IPv6 but mask is IPv4, mask is converted to IPv6 and then the mask is applied. This may
result in an odd result.
If addr is IPv4 and mask is IPv6, addr will remain an IPv4 address if masking mask with
::ffff:0000:0000 results in ::ffff:0000:0000, (namely, if bytes 10 and 11 of mask are 0xFFFF).
Otherwise, addr is converted to an IPv6 address and the mask is performed in IPv6 space, which may
result in an odd result.
addr.mask prefix(prefix )
Return a copy of addr masked by the high prefix bits. All bits below the prefix th bit will be set to
zero. The maximum value for prefix is 32 for an IPv4Addr, and 128 for an IPv6Addr.
addr.country code()
Return the two character country code associated with addr. If no country code is associated with addr,
return None. The country code association is initialized by the silk.init country codes() function. If
init country codes() is not called before calling this method, it will act as if init country codes()
was called with no argument.
IPv4Addr Object
An IPv4Addr object represents an IPv4 address. IPv4Addr is a subclass of IPAddr, and supports all
operations and methods that IPAddr supports.
class silk.IPv4Addr(address)
The constructor takes a string address, which must be a string representation of IPv4 address, an
IPAddr object, or an integer. A string will be parsed as an IPv4 address. An IPv4Addr object will
be copied. An IPv6Addr object will be converted to an IPv4 address, or throw a ValueError if the
conversion is not possible. A 32-bit integer will be converted to an IPv4 address.
Examples:
December 18, 2014
397
pysilk(3)
>>>
>>>
>>>
>>>
The SiLK Reference Guide
addr1
addr2
addr3
addr4
=
=
=
=
IPv4Addr(’192.160.1.1’)
IPv4Addr(IPAddr(’::ffff:12.34.56.78’))
IPv4Addr(addr1)
IPv4Addr(0x10000000)
Supported operations and methods:
addr.octets()
Return a tuple of the octets of addr. This will be a tuple of length 4 for IPv4 addresses, and a tuple
of length 16 for IPv6 addresses.
IPv6Addr Object
An IPv6Addr object represents an IPv6 address. IPv6Addr is a subclass of IPAddr, and supports all
operations and methods that IPAddr supports.
class silk.IPv6Addr(address)
The constructor takes a string address, which must be a string representation of either an IPv6 address,
an IPAddr object, or an integer. A string will be parsed as an IPv6 address. An IPv6Addr object
will be copied. An IPv4Addr object will be converted to an IPv6 address. A 128-bit integer will be
converted to an IPv6 address.
Examples:
>>>
>>>
>>>
>>>
addr1
addr2
addr3
addr4
=
=
=
=
IPAddr(’2001:db8::1428:57ab’)
IPv6Addr(IPAddr(’192.160.1.1’))
IPv6Addr(addr1)
IPv6Addr(0x100000000000000000000000)
IPWildcard Object
An IPWildcard object represents a range or block of IP addresses. The IPWildcard object handles iteration
over IP addresses with for x in wildcard .
class silk.IPWildcard(wildcard )
The constructor takes a string representation wildcard of the wildcard address. The string wildcard can
be an IP address, an IP with a CIDR notation, an integer, an integer with a CIDR designation, or an
entry in SiLK wildcard notation. In SiLK wildcard notation, a wildcard is represented as an IP address
in canonical form with each octet (IPv4) or hexadectet (IPv6) represented by one of following: a value,
a range of values, a comma separated list of values and ranges, or the character ’x’ used to represent
the entire octet or hexadectet. IPv6 wildcard addresses are only accepted if silk.ipv6 enabled() returns
True. The wildcard element can also be an IPWildcard, in which case a duplicate reference is returned.
Examples:
>>>
>>>
>>>
>>>
398
a
b
c
d
=
=
=
=
IPWildcard(’1.2.3.0/24’)
IPWildcard(’ff80::/16’)
IPWildcard(’1.2.3.4’)
IPWildcard(’::ffff:0102:0304’)
December 18, 2014
The SiLK Reference Guide
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
e
f
g
h
i
j
k
l
m
=
=
=
=
=
=
=
=
=
pysilk(3)
IPWildcard(’16909056’)
IPWildcard(’16909056/24’)
IPWildcard(’1.2.3.x’)
IPWildcard(’1:2:3:4:5:6:7.x’)
IPWildcard(’1.2,3.4,5.6,7’)
IPWildcard(’1.2.3.0-255’)
IPWildcard(’::2-4’)
IPWildcard(’1-2:3-4:5-6:7-8:9-a:b-c:d-e:0-ffff’)
IPWildcard(a)
Supported operations and methods:
addr in wildcard
Return True if addr is in wildcard, False otherwise.
addr not in wildcard
Return False if addr is in wildcard, True otherwise.
string in wildcard
Return the result of IPAddr(string ) in wildcard .
string not in wildcard
Return the result of IPAddr(string ) not in wildcard .
wildcard.is ipv6()
Return True if wildcard contains IPv6 addresses, False otherwise.
str(wildcard )
Return the string that was used to construct wildcard.
IPSet Object
An IPSet object represents a set of IP addresses, as produced by rwset(1) and rwsetbuild(1). The IPSet
object handles iteration over IP addresses with for x in set, and iteration over CIDR blocks using for x
in set.cidr iter().
In the following documentation, and ip iterable can be any of:
• an IPAddr object representing an IP address
• the string representation of a valid IP address
• an IPWildcard object
• the string representation of an IPWildcard
• an iterable of any combination of the above
• another IPSet object
class silk.IPSet([ip iterable])
The constructor creates an empty IPset. If an ip iterable is supplied as an argument, each member of
ip iterable will be added to the IPset.
December 18, 2014
399
pysilk(3)
The SiLK Reference Guide
Other constructors, all class methods:
silk.IPSet.load(path)
Create an IPSet by reading a SiLK IPset file. path must be a valid location of an IPset.
Other class methods:
silk.IPSet.supports ipv6()
Return whether this implementation of IPsets supports IPv6 addresses.
Supported operations and methods:
In the lists of operations and methods below,
• set is an IPSet object
• addr can be an IPAddr object or the string representation of an IP address.
• set2 is an IPSet object. The operator versions of the methods require an IPSet object.
• ip iterable is an iterable over IP addresses as accepted by the IPSet constructor. Consider ip iterable
as creating a temporary IPSet to perform the requested method.
The following operations and methods do not modify the IPSet:
set.cardinality()
Return the cardinality of set.
len(set)
Return the cardinality of set. In Python 2.x, this method will raise OverflowError if the number of
IPs in the set cannot be represented by Python’s Plain Integer type--that is, if the value is larger than
sys.maxint. The cardinality() method will not raise this exception.
set.is ipv6()
Return True if set is a set of IPv6 addresses, and False if it a set of IPv4 addresses. For the purposes
of this method, IPv4-in-IPv6 addresses (that is, addresses in the ::ffff:0:0/96 prefix) are considered
IPv6 addresses.
addr in set
Return True if addr is a member of set; False otherwise.
addr not in set
Return False if addr is a member of set; True otherwise.
set.copy()
Return a new IPSet with a copy of set.
set.issubset(ip iterable)
set <= set2
Return True if every IP address in set is also in set2. Return False otherwise.
400
December 18, 2014
The SiLK Reference Guide
pysilk(3)
set.issuperset(ip iterable)
set >= set2
Return True if every IP address in set2 is also in set. Return False otherwise.
set.union(ip iterable[, ...])
set | other | ...
Return a new IPset containing the IP addresses in set and all other s.
set.intersection(ip iterable[, ...])
set & other & ...
Return a new IPset containing the IP addresses common to set and other s.
set.difference(ip iterable[, ...])
set - other - ...
Return a new IPset containing the IP addresses in set but not in other s.
set.symmetric difference(ip iterable)
set Л† other
Return a new IPset containing the IP addresses in either set or in other but not in both.
set.isdisjoint(ip iterable)
Return True when none of the IP addresses in ip iterable are present in set. Return False otherwise.
set.cidr iter()
Return an iterator over the CIDR blocks in set. Each iteration returns a 2-tuple, the first element of
which is the first IP address in the block, the second of which is the prefix length of the block. Can be
used as for (addr, prefix ) in set.cidr iter().
set.save(filename, compression=DEFAULT)
Save the contents of set in the file filename. The compression determines the compression method
used when outputting the file. Valid values are the same as those in silk.silkfile open().
The following operations and methods will modify the IPSet:
set.add(addr )
Add addr to set and return set. To add multiple IP addresses, use the add range() or update()
methods.
set.discard(addr )
Remove addr from set if addr is present; do nothing if it is not. Return set. To discard multiple IP
addresses, use the difference update() method. See also the remove() method.
set.remove(addr )
Similar to discard(), but raise KeyError if addr is not a member of set.
set.pop()
Remove and return an arbitrary address from set. Raise KeyError if set is empty.
December 18, 2014
401
pysilk(3)
The SiLK Reference Guide
set.clear()
Remove all IP addresses from set and return set.
set.convert(version)
Convert set to an IPv4 IPset if version is 4 or to an IPv6 IPset if version is 6. Return set. Raise
ValueError if version is not 4 or 6. If version is 4 and set contains IPv6 addresses outside of the
::ffff:0:0/96 prefix, raise ValueError and leave set unchanged.
set.add range(start, end )
Add all IP addresses between start and end, inclusive, to set. Raise ValueError if end is less than
start.
set.update(ip iterable[, ...])
set |= other | ...
Add the IP addresses specified in other s to set; the result is the union of set and other s.
set.intersection update(ip iterable[, ...])
set &= other & ...
Remove from set any IP address that does not appear in other s; the result is the intersection of set
and other s.
set.difference update(ip iterable[, ...])
set -= other | ...
Remove from set any IP address found in other s; the result is the difference of set and other s.
set.symmetric difference update(ip iterable)
set Л†= other
Update set, keeping the IP addresses found in set or in other but not in both.
RWRec Object
An RWRec object represents a SiLK Flow record.
class silk.RWRec([rec],[field=value],...)
This constructor creates an empty RWRec object. If an RWRec rec is supplied, the constructor will
create a copy of it. The variable rec can be a dictionary, such as that supplied by the as dict() method.
Initial values for record fields can be included. Note that setting or accessing certain attributes on an
RWRec causes the silk.site.init site() function to be called with no argument if it has not yet been
called successfully.
Example:
>>> recA = RWRec(input=10, output=20)
>>> recB = RWRec(recA, output=30)
>>> (recA.input, recA.output)
(10, 20)
>>> (recB.input, recB.output)
(10, 30)
402
December 18, 2014
The SiLK Reference Guide
pysilk(3)
Instance attributes:
rec.application
The service port of the flow rec as set by the flow meter if the meter supports it, a 16-bit integer. The
yaf(1) flow meter refers to this value as the appLabel. The default application value is 0.
rec.bytes
The count of the number of bytes in the flow rec, a 32-bit integer. The default bytes value is 0.
rec.classname
(READ ONLY) The class name of assigned to the flow rec, a string. Calls silk.site.init site(). The
default classname is ?. The classname cannot be modified by itself. In order to modify the classname,
you also need to modify the typename. See the rec.classtype attribute.
rec.classtype
A tuple of the classname and the typename of the flow rec. Implicitly calls silk.site.init site() with no
arguments if silk.site.have site config() returns False.
rec.classtype id
The ID for the class and type of the flow rec. The default classtype id value is 0.
rec.dip
The destination IP of the flow rec, an IPAddr object. The default dip value is IPAddr(’0.0.0.0’). May
be set using a string containing a valid IP address.
rec.dport
The destination port of the flow rec, a 16-bit integer. The default dport value is 0. Since the destination
port field is also used to store the values for the ICMP type and code, setting this value may modify
rec.icmptype and rec.icmpcode.
rec.duration
The duration of the flow rec, a datetime.timedelta object. The default duration value is 0. Changing
the rec.duration attribute will modify the rec.etime attribute such that (rec.etime - rec.stime) == the
new rec.duration. The maximum possible duration is datetime.timedelta(milliseconds=0xffffffff). See
also rec.duration secs.
rec.duration secs
The duration of the flow rec in seconds, a float. The default duration secs value is 0. Changing the
rec.duration secs attribute will modify the rec.etime attribute in the same way as changing rec.duration.
The maximum possible duration secs value is 4294967.295.
rec.etime
The end time of the flow rec, a datetime.datetime object. The default etime value is the UNIX epoch
time, datetime.datetime(1970,1,1,0,0). Changing the rec.etime attribute modifies the flow record’s
duration. If the new duration would become negative or would become larger than RWRec supports,
a ValueError will be raised. See also rec.etime epoch secs.
rec.etime epoch secs
The end time of the flow rec as a number of seconds since the epoch time, a float. Epoch time is 197001-01 00:00:00. The default etime epoch secs value 0. Changing the rec.etime epoch secs attribute
modifies the flow record’s duration. If the new duration would become negative or would become
larger than RWRec supports, a ValueError will be raised.
December 18, 2014
403
pysilk(3)
The SiLK Reference Guide
rec.initial tcpflags
The TCP flags on the first packet of the flow rec, a TCPFlags object. The default initial tcpflags
value is None. The rec.initial tcpflags attribute may be set to a new TCPFlags object, or a string
or number which can be converted to a TCPFlags object by the TCPFlags() constructor. Setting rec.initial tcpflags when rec.session tcpflags is None sets the latter to TCPFlags(”). Setting
rec.initial tcpflags or rec.session tcpflags sets rec.tcpflags to the binary OR of their values. Trying to
set rec.initial tcpflags when rec.protocol is not 6 (TCP) will raise an AttributeError.
rec.icmpcode
The ICMP code of the flow rec, an 8-bit integer. The default icmpcode value is 0. The value is only
meaningful when rec.protocol is ICMP (1) or when is ipv6()|/rec.is ipv6() is True and rec.protocol
is ICMPv6 (58). Since a record’s ICMP type and code are stored in the destination port, setting this
value may modify rec.dport.
rec.icmptype
The ICMP type of the flow rec, an 8-bit integer. The default icmptype value is 0. The value is only
meaningful when rec.protocol is ICMP (1) or when is ipv6()|/rec.is ipv6() is True and rec.protocol
is ICMPv6 (58). Since a record’s ICMP type and code are stored in the destination port, setting this
value may modify rec.dport.
rec.input
The SNMP interface where the flow rec entered the router or the vlanId if the packing tools are
configured to capture it (see sensor.conf(5)), a 16-bit integer. The default input value is 0.
rec.nhip
The next-hop IP of the flow rec as set by the router, an IPAddr object. The default nhip value is
IPAddr(’0.0.0.0’). May be set using a string containing a valid IP address.
rec.output
The SNMP interface where the flow rec exited the router or the postVlanId if the packing tools are
configured to capture it (see sensor.conf(5)), a 16-bit integer. The default output value is 0.
rec.packets
The packet count for the flow rec, a 32-bit integer. The default packets value is 0.
rec.protocol
The IP protocol of the flow rec, an 8-bit integer. The default protocol value is 0. Setting rec.protocol
to a value other than 6 (TCP) causes rec.initial tcpflags and rec.session tcpflags to be set to None.
rec.sensor
The name of the sensor where the flow rec was collected, a string. Implicitly calls silk.site.init site()
with no arguments if silk.site.have site config() returns False. The default sensor value is ?.
rec.sensor id
The ID of the sensor where the flow rec was collected, a 16-bit integer. The default sensor value is 0.
rec.session tcpflags
The union of the flags of all but the first packet in the flow rec, a TCPFlags object. The default session tcpflags value is None. The rec.session tcpflags attribute may be set to a new TCPFlags object,
or a string or number which can be converted to a TCPFlags object by the TCPFlags() constructor.
Setting rec.session tcpflags when rec.initial tcpflags is None sets the latter to TCPFlags(”). Setting
rec.initial tcpflags or rec.session tcpflags sets rec.tcpflags to the binary OR of their values. Trying to
set rec.session tcpflags when rec.protocol is not 6 (TCP) will raise an AttributeError.
404
December 18, 2014
The SiLK Reference Guide
pysilk(3)
rec.sip
The source IP of the flow rec, an IPAddr object. The default sip value is IPAddr(’0.0.0.0’). May be
set using a string containing a valid IP address.
rec.sport
The source port of the flow rec, an integer. The default sport value is 0.
rec.stime
The start time of the flow rec, a datetime.datetime object. The default stime value is the UNIX
epoch time, datetime.datetime(1970,1,1,0,0). Modifying the rec.stime attribute will modify the flow’s
end time such that rec.duration is constant. The maximum possible stime is 2038-01-19 03:14:07 UTC.
See also rec.etime epoch secs.
rec.stime epoch secs
The start time of the flow rec as a number of seconds since the epoch time, a float. Epoch time is 197001-01 00:00:00. The default stime epoch secs value 0. Changing the rec.stime epoch secs attribute will
modify the flow’s end time such that rec.duration is constant. The maximum possible stime epoch secs
is 2147483647 (2Л†31-1).
rec.tcpflags
The union of the TCP flags of all packets in the flow rec, a TCPFlags object. The default tcpflags
value is TCPFlags(’ ’). The rec.tcpflags attribute may be set to a new TCPFlags object, or a string
or number which can be converted to a TCPFlags object by the TCPFlags() constructor. Setting
rec.tcpflags sets rec.initial tcpflags and rec.session tcpflags to None. Setting rec.initial tcpflags or
rec.session tcpflags changes rec.tcpflags to the binary OR of their values.
rec.timeout killed
Whether the flow rec was closed early due to timeout by the collector, a boolean. The default timeout killed value is False.
rec.timeout started
Whether the flow rec is a continuation from a timed-out flow, a boolean. The default timeout started
value is False.
rec.typename
(READ ONLY) The type name of the flow rec, a string. Implicitly calls silk.site.init site() with no
arguments if silk.site.have site config() returns False. The default typename is 255. The typename
cannot be modified by itself. In order to modify the typename, you also need to modify the classname.
See the rec.classtype attribute.
rec.uniform packets
Whether the flow rec contained only packets of the same size, a boolean. The default uniform packets
value is False.
Supported operations and methods:
rec.is icmp()
Return True if the protocol of rec is 1 (ICMP) or if the protocol of rec is 58 (ICMPv6) and
is ipv6()/rec.is ipv6() is True. Return False otherwise.
rec.is ipv6()
Return True if rec contains IPv6 addresses, False otherwise.
December 18, 2014
405
pysilk(3)
The SiLK Reference Guide
rec.is web()
Return True if rec can be represented as a web record, False otherwise. A record can be represented
as a web record if the protocol is TCP (6) and either the source or destination port is one of 80, 443,
or 8080.
rec.as dict()
Return a dictionary representing the contents of rec. Implicitly calls silk.site.init site() with no arguments if silk.site.have site config() returns False.
rec.to ipv4()
Return a new copy of rec with the IP addresses (sip, dip, and nhip) converted to IPv4. If any of these
addresses cannot be converted to IPv4, (that is, if any address is not in the ::ffff:0:0/96 prefix) return
None.
rec.to ipv6()
Return a new copy of rec with the IP addresses (sip, dip, and nhip) converted to IPv6. Specifically,
the function maps the IPv4 addresses into the ::ffff:0:0/96 prefix.
str(rec)
Return the string representation of rec.as dict().
rec1 == rec2
Return True if rec1 is structurally equivalent to rec2. Return False otherwise.
rec1 != rec2
Return True if rec1 is not structurally equivalent to rec2 Return False otherwise.
SilkFile Object
A SilkFile object represents a channel for writing to or reading from SiLK Flow files. A SiLK file open for
reading can be iterated over using for rec in file.
Creation functions:
silk.silkfile open(filename, mode, compression=DEFAULT, notes=[], invocations=[])
This function takes a filename, a mode, and a set of optional keyword parameters. It returns a SilkFile
object. The mode should be one of the following constant values:
silk.READ
Open file for reading
silk.WRITE
Open file for writing
silk.APPEND
Open file for appending
The filename should be the path to the file to open. A few filenames are treated specially. The filename
stdin maps to the standard input stream when the mode is READ. The filenames stdout and stderr
map to the standard output and standard error streams respectively when the mode is WRITE. A
filename consisting of a single hyphen (-) maps to the standard input if the mode is READ, and to
the standard output if the mode is WRITE.
406
December 18, 2014
The SiLK Reference Guide
pysilk(3)
The compression parameter may be one of the following constants. (This list assumes SiLK was
built with the required libraries. To check which compression methods are available at your site, see
silk.get configuration(”COMPRESSION METHODS”)).
silk.DEFAULT
Use the default compression scheme compiled into SiLK.
silk.NO COMPRESSION
Use no compression.
silk.ZLIB
Use zlib block compression (as used by gzip(1)).
silk.LZO1X
Use lzo1x block compression.
If notes or invocations are set, they should be list of strings. These add annotation and invocation
headers to the file. These values are visible by the rwfileinfo(1) program.
Examples:
>>> myinputfile = silkfile_open(’/path/to/file’, READ)
>>> myoutputfile = silkfile_open(’/path/to/file’, WRITE,
compression=LZO1X,
notes=[’My output file’,
’another annotation’])
silk.silkfile fdopen(fileno, mode, filename=None, compression=DEFAULT, notes=[], invocations=[])
This function takes an integer file descriptor, a mode, and a set of optional keyword parameters. It
returns a SilkFile object. The filename parameter is used to set the value of the name attribute of
the resulting object. All other parameters work as described in the silk.silkfile open() function.
Deprecated constructor:
class silk.SilkFile(filename, mode, compression=DEFAULT, notes=[], invocations=[])
This constructor creates a SilkFile object. The parameters are identical to those used by the silkfile open() function. This constructor is deprecated as of SiLK 3.0.0. For future compatibility, please
use the silkfile open() function instead of the SilkFile() constructor to create SilkFile objects.
Instance attributes:
file.name
The filename that was used to create file.
file.mode
The mode that was used to create file. Valid values are READ, WRITE, or APPEND.
Instance methods:
file.read()
Return an RWRec representing the next record in the SilkFile file. If there are no records left in the
file, return None.
December 18, 2014
407
pysilk(3)
The SiLK Reference Guide
file.write(rec)
Write the RWRec rec to the SilkFile file. Return None.
file.next()
A SilkFile object is its own iterator. For example, iter(file) returns file. When the SilkFile is used
as an iterator, the next() method is called repeatedly. This method returns the next record, or raises
StopIteration once the end of file is reached
file.notes()
Return the list of annotation headers for the file as a list of strings.
file.invocations()
Return the list of invocation headers for the file as a list of strings.
file.close()
Close the file and return None.
PrefixMap Object
A PrefixMap object represents an immutable mapping from IP addresses or protocol/port pairs to labels.
PrefixMap objects are created from SiLK prefix map files as created by rwpmapbuild(1).
class silk.PrefixMap(filename)
The constructor creates a prefix map initialized from the filename. The PrefixMap object will be of
one of the two subtypes of PrefixMap: an AddressPrefixMap or a ProtoPortPrefixMap.
Supported operations and methods:
pmap[key ]
Return the string label associated with key in pmap. key must be of the correct type: either an
IPAddr if pmap is an AddressPrefixMap, or a 2-tuple of integers (protocol, port), if pmap is a
ProtoPortPrefixMap. The method raises TypeError when the type of the key is incorrect.
pmap.get(key, default=None)
Return the string label associated with key in pmap. Return the value default if key is not in pmap,
or if key is of the wrong type or value to be a key for pmap.
pmap.values()
Return a tuple of the labels defined by the PrefixMap pmap.
pmap.iterranges()
Return an iterator that will iterate over ranges of contiguous values with the same label. The return
values of the iterator will be the 3-tuple (start, end, label ), where start is the first element of the range,
end is the last element of the range, and label is the label for that range.
408
December 18, 2014
The SiLK Reference Guide
pysilk(3)
Bag Object
A Bag object is a representation of a multiset. Each key represents a potential element in the set, and the
key’s value represents the number of times that key is in the set. As such, it is also a reasonable representation
of a mapping from keys to integers.
Please note, however, that despite its set-like properties, Bag objects are not nearly as efficient as IPSet
objects when representing large contiguous ranges of key data.
In PySiLK, the Bag object is designed to look and act similar to Python dictionary objects, and in many
cases Bags and dicts can be used interchangeably. There are differences, however, the primary of which is
that bag[key ] returns a value for all values in the key range of the bag. That value will be an integer zero
for all key values that have not been incremented.
class silk.Bag(mapping =None, key type=None,
counter len=None)
key len=None,
counter type=None,
The constructor creates a bag. All arguments are optional, and can be used as keyword arguments.
If mapping is included, the bag is initialized from that mapping. Valid mappings are:
• a Bag
• a key/value dictionary
• an iterable of key/value pairs
The key type and key len arguments describe the key field of the bag. The key type should be a string from
the list of valid types below. The key len should be an integer describing the number of bytes that will
represent values of key type. The key type argument is case-insensitive.
If key type is not specified, it defaults to ’any-ipv6’, unless silk.ipv6 enabled() is False, in which case the
default is ’any-ipv4’. The one exception to this is when key type is not specified, but key len is specified with
a value of less than 16. In this case, the default type is ’custom’.
Note: Key types that specify IPv6 addresses are not valid if silk.ipv6 enabled() returns False. An error
will be thrown if they are used in this case.
If key len is not specified, it defaults to the default number of bytes for the given key type (which can be
determined by the chart below). If specified, key len must be one of the following integers: 1, 2, 4, 16.
The counter type and counter len arguments describe the counter value of the bag. The counter type should
be a string from the list of valid types below. The counter len should be an integer describing the number
of bytes that will represent valid of counter type. The counter type argument is case insensitive.
If counter type is not specified, it defaults to ’custom’.
If counter len is not specified, it defaults to 8. Currently, 8 is the only valid value of counter len.
Here is the list of valid key and counter types, along with their default key len values:
’sIPv4’, 4
’dIPv4’, 4
’sPort’, 2
’dPort’, 2
’protocol’, 1
December 18, 2014
409
pysilk(3)
The SiLK Reference Guide
’packets’, 4
’bytes’, 4
’flags’, 1
’sTime’, 4
’duration’, 4
’eTime’, 4
’sensor’, 2
’input’, 2
’output’, 2
’nhIPv4’, 4
’initialFlags’, 1
’sessionFlags’, 1
’attributes’, 1
’application’, 2
’class’, 1
’type’, 1
’icmpTypeCode’, 2
’sIPv6’, 16
’dIPv6’, 16
’nhIPv6’, 16
’records’, 4
’sum-packets’, 4
’sum-bytes’, 4
’sum-duration’, 4
’any-ipv4’, 4
’any-ipv6’, 16
’any-port’, 2
’any-snmp’, 2
’any-time’, 4
’custom’, 4
410
December 18, 2014
The SiLK Reference Guide
pysilk(3)
Deprecation Notice: For compatibility with SiLK 2.x, the key type argument may be a Python class. An
object of the key type class must be constructable from an integer, and it must possess an int () method
which retrieves that integer from the object. Regardless of the maximum integer value supported by the
key type class, internally the bag will store the keys as type ’custom’ with length 4.
Other constructors, all class methods:
silk.Bag.ipaddr(mapping , counter type=None, counter len=None)
Creates a Bag using ’any-ipv6’ as the key type (or ’any-ipv4’ if silk.ipv6 enabled() is False).
counter type and counter len are used as in the standard Bag constructor.
Equivalent to
Bag(mapping).
silk.Bag.integer(mapping , key len=None, counter type=None, counter len=None)
Creates a Bag using ’custom’ as the key type (integer bag). key len, counter type, and counter len are
used as in the standard Bag constructor. Equivalent to Bag(mapping, key type=’custom’).
silk.Bag.load(path, key type=None)
Creates a Bag by reading a SiLK bag file. path must be a valid location of a bag. When present, the
key type argument is used as in the Bag constructor, ignoring the key type specified in the bag file.
When key type is not provided and the bag file does not contain type information, the key is set to
’custom’ with a length of 4.
silk.Bag.load ipaddr(path)
Creates
an
IP
address
bag
from
a
SiLK
bag
file.
Equivalent
Bag.load(path, key type = IPv4Addr). This constructor is deprecated as of SiLK 3.2.0.
to
silk.Bag.load integer(path)
Creates an integer bag from a SiLK bag file. Equivalent to Bag.load(path, key type = int). This
constructor is deprecated as of SiLK 3.2.0.
Constants:
silk.BAG COUNTER MAX
This constant contains the maximum possible value for Bag counters.
Other class methods:
silk.Bag.field types()
Returns a tuple of strings which are valid key type or counter type values.
silk.Bag.type merge(type a, type b)
Given two types from Bag.field types(), returns the type that would be given (by default)
to a bag that is a result of the co-mingling of two bags of the given types. For example:
Bag.type merge(’sport’,’dport’) == ’any-port’.
Supported operations and methods:
In the lists of operations and methods below,
• bag and bag2 are Bag objects
December 18, 2014
411
pysilk(3)
The SiLK Reference Guide
• key and key2 are IPAddrs for bags that contain IP addresses, or integers for other bags
• value and value2 are integers which represent the counter associated a key in the bag
• ipset is an IPSet object
• ipwildcard is an IPWildcard object
The following operations and methods do not modify the Bag:
bag.get info()
Return information about the keys and counters of the bag. The return value is a dictionary with the
following keys and values:
’key type’
The current key type, as a string.
’key len’
The current key length in bytes.
’counter type’
The current counter type, as a string.
’counter len’
The current counter length in bytes.
The keys have the same names as the keyword arguments to the bag constructor. As a result, a bag
with the same key and value information as an existing bag can be generated by using the following
idiom: Bag(**bag.get info()).
bag.copy()
Return a new Bag which is a copy of bag.
bag [key ]
Return the counter value associated with key in bag.
bag [key :key2 ] or bag [key,key2,...]
Return a new Bag which contains only the elements in the key range [key, key2 ), or a new Bag
containing only the given elements in the comma-separated list. In point of fact, the argument(s) in
brackets can be any number of comma separated keys or key ranges. For example: bag [1,5,15:18,20]
will return a bag which contains the elements 1, 5, 15, 16, 17, and 20 from bag.
bag [ipset]
Return a new Bag which contains only elements in bag that are also contained in ipset. This is only
valid for IP address bags. The ipset can be included as part of a comma-separated list of slices, as
above.
bag [ipwildcard ]
Return a new Bag which contains only elements that are also contained in ipwildcard. This is only
valid for IP address bags. The ipwildcard can be included as part of a comma-separated list of slices,
as above.
key in bag
Return True if bag[key] is non-zero, False otherwise.
412
December 18, 2014
The SiLK Reference Guide
pysilk(3)
bag.get(key, default=None)
Return bag[key] if key is in bag, otherwise return default.
bag.items()
Return a list of (key, value) pairs for all keys in bag with non-zero values. This list is not guaranteed
to be sorted in any order.
bag.iteritems()
Return an iterator over (key, value) pairs for all keys in bag with non-zero values. This iterator is
not guaranteed to iterate over items in any order.
bag.sorted iter()
Return an iterator over (key, value) pairs for all keys in bag with non-zero values. This iterator is
guaranteed to iterate over items in key-sorted order.
bag.keys()
Return a list of keys for all keys in bag with non-zero values. This list is guaranteed to be in key-sorted
order.
bag.iterkeys()
Return an iterkeys over keys for all keys in bag with non-zero values. This iterator is not guaranteed
to iterate over keys in any order.
bag.values()
Return a list of values for all keys in bag with non-zero values. The list is guaranteed to be in key-sorted
order.
bag.itervalues()
Return an iterator over values for all keys in bag with non-zero values. This iterator is not guaranteed
iterate over values in any order, but the order is consistent with that returned by iterkeys().
bag.group iterator(bag2 )
Return an iterator over keys and values of a pair of Bags. For each key which is in either bag or
bag2, this iterator will return a (key, value, value2 ) triple, where value is bag.get(key), and value2
is bag.get(key). This iterator is guaranteed to iterate over triples in key order.
bag + bag2
Add two bags together. Return a new Bag for which newbag [key ] = bag [key ] + bag2 [key ] for
all keys in bag and bag2. Will raise an OverflowError if the resulting value for a key is greater than
BAG COUNTER MAX. If the two bags are of different types, the resulting bag will be of a type
determined by Bag.type merge().
bag - bag2
Subtract two bags. Return a new Bag for which newbag [key ] = bag [key ] - bag2 [key ] for all keys
in bag and bag2, as long as the resulting value for that key would be non-negative. If the resulting
value for a key would be negative, the value of that key will be zero. If the two bags are of different
types, the resulting bag will be of a type determined by Bag.type merge().
bag.min(bag2 )
Return a new Bag for which newbag [key ] = min(bag [key ], bag2 [key ]) for all keys in bag and bag2.
December 18, 2014
413
pysilk(3)
The SiLK Reference Guide
bag.max(bag2 )
Return a new Bag for which newbag [key ] = max(bag [key ], bag2 [key ]) for all keys in bag and
bag2.
bag.div(bag2 )
Divide two bags. Return a new Bag for which newbag [key ] = bag [key ] / bag2 [key ]) rounded to
the nearest integer for all keys in bag and bag2, as long as bag2 [key] is non-zero. newbag [key ] = 0
when bag2 [key] is zero. If the two bags are of different types, the resulting bag will be of a type
determined by Bag.type merge().
bag * integer
integer * bag
Multiple a bag by a scalar. Return a new Bag for which newbag [key ] = bag [key ] * integer for all
keys in bag.
bag.intersect(set like)
Return a new Bag which contains bag[key] for each key where key in set like is true. set like is any
argument that supports Python’s in operator, including Bags, IPSets, IPWildcards, and Python sets,
lists, tuples, et cetera.
bag.complement intersect(set like)
Return a new Bag which contains bag[key] for each key where key in set like is not true.
bag.ipset()
Return an IPSet consisting of the set of IP address key values from bag with non-zero values. This
only works if bag is an IP address bag.
bag.inversion()
Return a new integer Bag for which all values from bag are inserted as key elements. Hence, if two
keys in bag have a value of 5, newbag[5] will be equal to two.
bag == bag2
Return True if the contents of bag are equivalent to the contents of bag2, False otherwise.
bag != bag2
Return False if the contents of bag are equivalent to the contents of bag2, True otherwise.
bag.save(filename, compression=DEFAULT)
Save the contents of bag in the file filename. The compression determines the compression method
used when outputting the file. Valid values are the same as those in silk.silkfile open().
The following operations and methods will modify the Bag:
bag.clear()
Empty bag, such that bag[key] is zero for all keys.
bag [key ] = value
Set the number of key in bag to value.
del bag [key ]
Remove key from bag, such that bag[key] is zero.
414
December 18, 2014
The SiLK Reference Guide
pysilk(3)
bag.update(mapping )
For each item in mapping, bag is modified such that for each key in mapping, the value for that key in
bag will be set to the mapping’s value. Valid mappings are those accepted by the Bag() constructor.
bag.add(key [, key2 [, ...]])
Add one of each key to bag. This is the same as incrementing the value for each key by one.
bag.add(iterable)
Add one of each key in iterable to bag. This is the same as incrementing the value for each key by one.
bag.remove(key [, key2 [, ...]])
Remove one of each key from bag. This is the same as decrementing the value for each key by one.
bag.remove(iterable)
Remove one of each key in iterable from bag. This is the same as decrementing the value for each key
by one.
bag.incr(key, value = 1)
Increment the number of key in bag by value. value defaults to one.
bag.decr(key, value = 1)
Decrement the number of key in bag by value. value defaults to one.
bag += bag2
Equivalent to bag = bag + bag2 , unless an OverflowError is raised, in which case bag is
no longer necessarily valid. When an error is not raised, this operation takes less memory than
bag = bag + bag2 . This operation can change the type of bag, as determined by Bag.type merge().
bag -= bag2
Equivalent to bag = bag - bag2 . This operation takes less memory than bag = bag - bag2 . This
operation can change the type of bag, as determined by Bag.type merge().
bag *= integer
Equivalent to bag = bag * integer , unless an OverflowError is raised, in which case bag is no longer
necessarily valid. When an error is not raised, this operation takes less memory than bag = bag * integer .
bag.constrain values(min=None, max =None)
Remove key from bag if that key’s value is less than min or greater than max. At least one of min or
max must be specified.
bag.constrain keys(min=None, max =None)
Remove key from bag if that key is less than min, or greater than max. At least one of min or max
must be specified.
December 18, 2014
415
pysilk(3)
The SiLK Reference Guide
TCPFlags Object
A TCPFlags object represents the eight bits of flags from a TCP session.
class silk.TCPFlags(value)
The constructor takes either a TCPFlags value, a string, or an integer. If a TCPFlags value, it
returns a copy of that value. If an integer, the integer should represent the 8-bit representation of the
flags. If a string, the string should consist of a concatenation of zero or more of the characters F, S,
R, P, A, U, E, and C---upper or lower-case---representing the FIN, SYN, RST, PSH, ACK, URG, ECE,
and CWR flags. Spaces in the string are ignored.
Examples:
>>> a = TCPFlags(’SA’)
>>> b = TCPFlags(5)
Instance attributes (read-only):
flags.fin
True if the FIN flag is set on flags, False otherwise
flags.syn
True if the SYN flag is set on flags, False otherwise
flags.rst
True if the RST flag is set on flags, False otherwise
flags.psh
True if the PSH flag is set on flags, False otherwise
flags.ack
True if the ACK flag is set on flags, False otherwise
flags.urg
True if the URG flag is set on flags, False otherwise
flags.ece
True if the ECE flag is set on flags, False otherwise
flags.cwr
True if the CWR flag is set on flags, False otherwise
Supported operations and methods:
~flags
Return the bitwise inversion (not) of flags
flags1 & flags2
Return the bitwise intersection (and) of the flags from flags1 and flags2
416
December 18, 2014
The SiLK Reference Guide
pysilk(3)
flags1 | flags2
Return the bitwise union (or) of the flags from flags1 and flags2.
flags1 Л† flags2
Return the bitwise exclusive disjunction (xor) of the flags from flags1 and flags2.
int(flags)
Return the integer value of the flags set in flags.
str(flags)
Return a string representation of the flags set in flags.
flags.padded()
Return a string representation of the flags set in flags. This representation will be padded with spaces
such that flags will line up if printed above each other.
flags
When used in a setting that expects a boolean, return True if any flag value is set in flags. Return
False otherwise.
flags.matches(flagmask )
Given flagmask, a string of the form high flags/mask flags, return True if the flags of flags match
high flags after being masked with mask flags; False otherwise. Given a flagmask without the slash
(/), return True if all bits in flagmask are set in flags. I.e., a flagmask without a slash is interpreted
as ”flagmask /flagmask ”.
Constants:
The following constants are defined:
silk.TCP FIN
A TCPFlags value with only the FIN flag set
silk.TCP SYN
A TCPFlags value with only the SYN flag set
silk.TCP RST
A TCPFlags value with only the RST flag set
silk.TCP PSH
A TCPFlags value with only the PSH flag set
silk.TCP ACK
A TCPFlags value with only the ACK flag set
silk.TCP URG
A TCPFlags value with only the URG flag set
silk.TCP ECE
A TCPFlags value with only the ECE flag set
silk.TCP CWR
A TCPFlags value with only the CWR flag set
December 18, 2014
417
pysilk(3)
The SiLK Reference Guide
FGlob Object
An FGlob object is an iterable object which iterates over filenames from a SiLK data store. It does this
internally by calling the rwfglob(1) program. The FGlob object assumes that the rwfglob program is in
the PATH, and will raise an exception when used if not.
Note: It is generally better to use the silk.site.repository iter() function from the silk.site Module instead of
the FGlob object, as that function does not require the external rwfglob program. However, the FGlob
constructor allows you to use a different site configuration file every time, whereas the silk.site.init site()
function only supports a single site configuration file.
class silk.FGlob(classname=None, type=None,
sensors=None,
end date=None, data rootdir =None, site config file=None)
start date=None,
Although all arguments have defaults, at least one of classname, type, sensors, start date must be
specified. The arguments are:
classname
if given, should be a string representing the class name. If not given, defaults based on the site
configuration file, silk.conf(5).
type
if given, can be either a string representing a type name or comma-separated list of type names,
or can be a list of strings representing type names. If not given, defaults based on the site
configuration file, silk.conf.
sensors
if given, should be either a string representing a comma-separated list of sensor names or IDs,
and integer representing a sensor ID, or a list of strings or integers representing sensor names or
IDs. If not given, defaults to all sensors.
start date
if given, should be either a string in the format YYYY/MM/DD[:HH], a date object, a datetime
object (which will be used to the precision of one hour), or a time object (which is used for the
given hour on the current date). If not given, defaults to start of current day.
end date
if given, should be either a string in the format YYYY/MM/DD[:HH], a date object, a datetime
object (which will be used to the precision of one hour), or a time object (which is used for the
given hour on the current date). If not given, defaults to start date. The end date cannot be
specified without a start date.
data rootdir
if given, should be a string representing the directory in which to find the packed SiLK data files.
If not given, defaults to the value in the SILK DATA ROOTDIR environment variable or the
compiled-in default (/data).
site config file
if given, should be a string representing the path of the site configuration file, silk.conf.
If not given, defaults to the value in the SILK CONFIG FILE environment variable or
$SILK DATA ROOTDIR/silk.conf.
An FGlob object can be used as a standard iterator. For example:
for filename in FGlob(classname="all", start_date="2005/09/22"):
for rec in silkfile_open(filename):
...
418
December 18, 2014
The SiLK Reference Guide
pysilk(3)
silk.site Module
The silk.site module contains functions that load the SiLK site file, and query information from that file.
silk.site.init site(siteconf =None, rootdir =None)
Initializes the SiLK system’s site configuration. The siteconf parameter, if given, should be the path
and name of a SiLK site configuration file (see silk.conf(3)). If siteconf is omitted, the value specified
in the environment variable SILK CONFIG FILE will be used as the name of the configuration file. If
SILK CONFIG FILE is not set, the module looks for a file named silk.conf in the following directories:
the directory specified by the rootdir argument, the directory specified in the SILK DATA ROOTDIR
environment variable; the data root directory that is compiled into SiLK (/data); the directories
$SILK PATH/share/silk/ and $SILK PATH/share/.
The rootdir parameter, if given, should be the path to a SiLK data repository that a configuration that matches the SiLK site configuration. If rootdir is omitted, the value specified in the
SILK DATA ROOTDIR environment variable will be used, or if that variable is not set, the data
root directory that is compiled into SiLK (/data). The rootdir may be specified without a siteconf
argument by using rootdir as a keyword argument. I.e., init site(rootdir=”/data”).
This function should not generally be called explicitly unless one wishes to use a non-default site
configuration file.
The init site() function can only be called successfully once. The return value of init site() will be
true if the site configuration was successful, or False if a site configuration file was not found. If a
siteconf parameter was specified but not found, or if a site configuration file was found but did not
parse properly, an exception will be raised instead. Once init site() has been successfully invoked,
silk.site.have site config() will return True, and subsequent invocations of init site() will raise a
RuntimeError exception.
Some silk.site methods and RWRec members require information from the silk.conf file, and
when these methods are called or members accessed, the silk.site.init site() function is implicitly
invoked with no arguments if it has not yet been called successfully. The list of functions, methods,
and attributes that exhibit this behavior include: silk.site.sensors(), silk.site.classtypes(),
silk.site.classes(),
silk.site.types(),
silk.site.default types(),
silk.site.default class(),
silk.site.class sensors(), silk.site.sensor id(), silk.site.sensor from id(), silk.site.classtype id(),
silk.site.classtype from id(),
silk.site.set data rootdir(),
silk.site.repository iter(),
silk.site.repository silkfile iter(),
silk.site.repository full iter(),
rwrec.as dict(),
rwrec.classname, rwrec.typename, rwrec.classtype, and rwrec.sensor.
silk.site.have site config()
Return True if silk.site.init site() has been called and was able to successfully find and load a SiLK
configuration file, False otherwise.
silk.site.set data rootdir(rootdir )
Change the current SiLK data root directory once the silk.conf file has been loaded. This function
can be used to change the directory used by the silk.site iterator functions. To change the SiLK
data root directory before loading the silk.conf file, call silk.site.init site() with a rootdir argument.
set data rootdir() implicitly calls silk.site.init site() with no arguments before changing the root
directory if silk.site.have site config() returns False.
silk.site.get site config()
Return the current path to the SiLK site configuration file. Before silk.site.init site() is called successfully, this will return the place that init site() called with no arguments will first look for a
configuration file. After init site() has been successfully called, this will return the path to the file
that init site() loaded.
December 18, 2014
419
pysilk(3)
The SiLK Reference Guide
silk.site.get data rootdir()
Return the current SiLK data root directory.
silk.site.sensors()
Return a tuple of valid sensor names. Implicitly calls silk.site.init site() with no arguments if
silk.site.have site config() returns False. Returns an empty tuple if no site file is available.
silk.site.classes()
Return a tuple of valid class names. Implicitly calls silk.site.init site() with no arguments if
silk.site.have site config() returns False. Returns an empty tuple if no site file is available.
silk.site.types(class)
Return a tuple of valid type names for class class. Implicitly calls silk.site.init site() with no arguments
if silk.site.have site config() returns False. Throws KeyError if no site file is available or if class is
not a valid class.
silk.site.classtypes()
Return a tuple of valid (class name, type name) tuples. Implicitly calls silk.site.init site() with no
arguments if silk.site.have site config() returns False. Returns an empty tuple if no site file is available.
silk.site.default class()
Return the default class name.
Implicitly calls silk.site.init site() with no arguments if
silk.site.have site config() returns False. Returns None if no site file is available.
silk.site.default types(class)
Return a tuple of default types associated with class class. Implicitly calls silk.site.init site() with no
arguments if silk.site.have site config() returns False. Throws KeyError if no site file is available or
if class is not a valid class.
silk.site.class sensors(class)
Return a tuple of sensors that are in class class. Implicitly calls silk.site.init site() with no arguments
if silk.site.have site config() returns False. Throws KeyError if no site file is available or if class is
not a valid class.
silk.site.sensor classes(sensor )
Return a tuple of classes that are associated with sensor. Implicitly calls silk.site.init site() with no
arguments if silk.site.have site config() returns False. Throws KeyError if no site file is available or
if sensor is not a valid sensor.
silk.site.sensor description(sensor )
Return the sensor description as a string, or None if there is no description. Implicitly calls
silk.site.init site() with no arguments if silk.site.have site config() returns False. Throws KeyError if no site file is available or if sensor is not a valid sensor.
silk.site.sensor id(sensor )
Return the numeric sensor ID associated with the string sensor. Implicitly calls silk.site.init site() with
no arguments if silk.site.have site config() returns False. Throws KeyError if no site file is available
or if sensor is not a valid sensor.
silk.site.sensor from id(id )
Return the sensor name associated with the numeric sensor ID id. Implicitly calls silk.site.init site()
with no arguments if silk.site.have site config() returns False. Throws KeyError if no site file is
available or if id is not a valid sensor identifier.
420
December 18, 2014
The SiLK Reference Guide
pysilk(3)
silk.site.classtype id( (class, type) )
Return the numeric ID associated with the tuple (class, type). Implicitly calls silk.site.init site() with
no arguments if silk.site.have site config() returns False. Throws KeyError if no site file is available,
if class is not a valid class, or if type is not a valid type in class.
silk.site.classtype from id(id )
Return the (class, type) name pair associated with the numeric ID id. Implicitly calls silk.site.init site()
with no arguments if silk.site.have site config() returns False. Throws KeyError if no site file is
available or if id is not a valid identifier.
silk.site.repository iter(start=None, end =None,
classtypes=None, sensors=None)
classname=None,
types=None,
Return an iterator over file names in a SiLK repository. The repository is assumed to be in
the data root directory that is returned by silk.site.get data rootdir() and to conform to the format of the current site configuration. This function implicitly calls silk.site.init site() with no
arguments if silk.site.have site config() returns False. See also silk.site.repository full iter() and
silk.site.repository silkfile iter().
The following types are accepted for start and end :
• a datetime.datetime object, which is considered to be specified to hour precision
• a datetime.date object, which is considered to be specified to day precision
• a string in the SiLK date format YYYY/MM/DD[:HH], where the timezone depends on how SiLK
was compiled; check the value of silk.get configuration(”TIMEZONE SUPPORT”).
The rules for interpreting start and end are:
• When both start and end are specified to hour precision, files from all hours within that time
range are returned.
• When start is specified to day precision, the hour specified in end (if any) is ignored, and files for
all dates between midnight at start and the end of the day represented by end are returned.
• When end is not specified and start is specified to day precision, files for that complete day are
returned.
• When end is not specified and start is specified to hour precision, files for that single hour are
returned.
• When neither start nor end are specified, files for the current day are returned.
• It is an error to specify end without start, or to give an end that proceeds start.
To specify classes and types, either use the classname and types parameters or use the classtypes
parameter. It is an error to use classname or types when classtypes is specified.
The classname parameter should be a named class that appears in silk.site.classes(). If neither classname nor classtypes are specified, classname will default to that returned by silk.site.default class().
The types parameter should be either a named type that appears in silk.site.types(classname) or
a sequence of said named types. If neither types nor classtypes is specified, types will default to
silk.site.default types(classname).
The classtypes parameter should be a sequence of (classname, type) pairs. These pairs must be in the
sequence returned by silk.site.classtypes().
The sensors parameter should be either a sensor name or a sequence of sensor names from the sequence returned by silk.site.sensors(). If sensors is left unspecified, it will default to the list of sensors
supported by the given class(es).
December 18, 2014
421
pysilk(3)
The SiLK Reference Guide
silk.site.repository silkfile iter(start=None, end =None,
classtypes=None, sensors=None)
classname=None,
types=None,
Works similarly to silk.site.repository iter() except the file names that repository iter() would return
are opened as SilkFile objects and returned.
silk.site.repository full iter(start=None, end =None,
classtypes=None, sensors=None)
classname=None,
types=None,
Works similarly to silk.site.repository iter(). Unlike repository iter(), this iterator’s output will
include the names of files that do not exist in the repository. The iterator returns (filename, bool )
pairs where the bool value represents whether the given filename exists. For more information, see the
description of the --print-missing-files switch in rwfglob(1).
silk.plugin Module
silk.plugin is a module to support using PySiLK code as a plug-in to the rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), and rwuniq(1) applications. The module defines the following methods, which are described in the silkpython(3) manual page:
silk.plugin.register switch(switch name, handler=handler ,
[help=help string ])
[arg=needs arg ],
Define the command line switch --switch name that can be used by the PySiLK plug-in.
silk.plugin.register filter(filter , [finalize=finalize], [initialize=initialize])
Register the callback function filter that can be used by rwfilter to specify whether the flow record
passes or fails.
silk.plugin.register field(field name, [add rec to bin=add rec to bin,]
[bin compare=bin compare,]
[bin bytes=bin bytes,]
[bin merge=bin merge,]
[bin to text=bin to text,]
[column width=column width,]
[description=description,]
[initial value=initial value,]
[initialize=initialize,]
[rec to bin=rec to bin,]
[rec to text=rec to text])
Define the new key field or aggregate value field named field name. Key fields can be used in rwcut,
rwgroup, rwsort, rwstats, and rwuniq. Aggregate value fields can be used in rwstats and rwuniq.
Creating a field requires specifying one or more callback functions---the functions required depend on
the application(s) where the field will be used. To simplify field creation for common field types, the
remaining functions can be used instead.
silk.plugin.register int field(field name, int function, min, max , [width])
Create the key field field name whose value is an unsigned integer.
silk.plugin.register ipv4 field(field name, ipv4 function, [width])
Create the key field field name whose value is an IPv4 address.
silk.plugin.register ip field(field name, ipv4 function, [width])
Create the key field field name whose value is an IPv4 or IPv6 address.
silk.plugin.register enum field(field name, enum function, width, [ordering ])
Create the key field field name whose value is a Python object (often a string).
silk.plugin.register int sum aggregator(agg value name, int function, [max sum], [width])
Create the aggregate value field agg value name that maintains a running sum as an unsigned integer.
422
December 18, 2014
The SiLK Reference Guide
pysilk(3)
silk.plugin.register int max aggregator(agg value name, int function, [max max ], [width])
Create the aggregate value field agg value name that maintains the maximum unsigned integer value.
silk.plugin.register int min aggregator(agg value name, int function, [max min], [width])
Create the aggregate value field agg value name that maintains the minimum unsigned integer value.
EXAMPLE
The following is an example using the PySiLK bindings. The code is meant to show some standard PySiLK
techniques, but is not otherwise meant to be useful. Explanations for the code can be found in-line in the
comments.
#!/usr/bin/env python
# Use print functions (Compatible with Python 3.0; Requires 2.6+)
from __future__ import print_function
# Import the PySiLK bindings
from silk import *
# Import sys for the command line arguments.
import sys
# Main function
def main():
if len(sys.argv) != 3:
print ("Usage: %s infile outset" % sys.argv[0])
sys.exit(1)
# Open an silk file for reading
infile = silkfile_open(sys.argv[1], READ)
# Create an empty IPset
destset = IPSet()
# Loop over the records in the file
for rec in infile:
# Do comparisons based on rwrec field value
if (rec.protocol == 6 and rec.sport in [80, 8080] and
rec.packets > 3 and rec.bytes > 120):
# Add the dest IP of the record to the IPset
destset.add(rec.dip)
December 18, 2014
423
pysilk(3)
The SiLK Reference Guide
# Save the IPset for future use
try:
destset.save(sys.argv[2])
except:
sys.exit("Unable to write to %s" % sys.argv[2])
# count the items in the set
count = 0
for addr in destset:
count = count + 1
print("%d addresses" % count)
# Another way to do the same
print("%d addresses" % len(destset))
# Print the ip blocks in the set
for base_prefix in destset.cidr_iter():
print("%s/%d" % base_prefix)
# Call the main() function when this program is started
if __name__ == ’__main__’:
main()
ENVIRONMENT
The following environment variables affect the tools in the SiLK tool suite.
SILK CONFIG FILE
This environment variable contains the location of the site configuration file, silk.conf. This variable
will be used by silk.site.init site() if no argument is passed to that method.
SILK DATA ROOTDIR
This variable gives the root of directory tree where the data store of SiLK Flow files is maintained,
overriding the location that is compiled into the tools (/data). This variable will be used by the FGlob
constructor unless an explicit data rootdir value is specified. In addition, the silk.site.init site() may
search for the site configuration file, silk.conf, in this directory.
SILK COUNTRY CODES
This environment variable gives the location of the country code mapping file that the
silk.init country codes() function will use when no name is given to that function. The value of
this environment variable may be a complete path or a file relative to the SILK PATH. See the FILES
section for standard locations of this file.
SILK CLOBBER
The SiLK tools normally refuse to overwrite existing files. Setting SILK CLOBBER to a non-empty
value removes this restriction.
424
December 18, 2014
The SiLK Reference Guide
pysilk(3)
SILK PATH
This environment variable gives the root of the install tree. When searching for configuration files,
PySiLK may use this environment variable. See the FILES section for details.
PYTHONPATH
This is the search path that Python uses to find modules and extensions. The SiLK Python extension
described in this document may be installed outside Python’s installation tree; for example, in SiLK’s
installation tree. It may be necessary to set or modify the PYTHONPATH environment variable so
Python can find the SiLK extension.
PYTHONVERBOSE
If the SiLK Python extension fails to load, setting this environment variable to a non-empty string
may help you debug the issue.
SILK PYTHON TRACEBACK
When set, Python plug-ins (see silkpython(3)) will output trace back information regarding Python
errors to the standard error.
PATH
This is the standard search path for executable programs. The FGlob constructor will invoke the
rwfglob(1) program; the directory containing rwfglob should be included in the PATH.
TZ
When a SiLK installation is built to use the local timezone (to determine if this is the case, check
the value of silk.get configuration(”TIMEZONE SUPPORT”)), the value of the TZ environment
variable determines the timezone in which silk.site.repository iter() parses timestamp strings. If the
TZ environment variable is not set, the default timezone is used. Setting TZ to 0 or the empty string
causes timestamps to be parsed as UTC. The value of the TZ environment variable is ignored when
the SiLK installation uses utc. For system information on the TZ variable, see tzset(3).
FILES
${SILK CONFIG FILE}
ROOT DIRECTORY/silk.conf
${SILK PATH}/share/silk/silk.conf
${SILK PATH}/share/silk.conf
/usr/local/share/silk/silk.conf
/usr/local/share/silk.conf
Possible locations for the SiLK site configuration file which are checked when no argument is passed
to silk.site.init site().
${SILK COUNTRY CODES}
${SILK PATH}/share/silk/country codes.pmap
${SILK PATH}/share/country codes.pmap
/usr/local/share/silk/country codes.pmap
December 18, 2014
425
pysilk(3)
The SiLK Reference Guide
/usr/local/share/country codes.pmap
Possible locations for the country code mapping file used by silk.init country codes() when no name
is given to the function.
${SILK DATA ROOTDIR}/
/data/
Locations for the root directory of the data repository. The silk.site.init site() may search for the site
configuration file, silk.conf, in this directory.
SEE ALSO
silkpython(3), rwfglob(1), rwfileinfo(1), rwfilter(1), rwcut(1), rwpmapbuild(1), rwset(1), rwsetbuild(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), rwgeoip2ccmap(1), silk.conf(5), sensor.conf(5), silk(7), python(1), gzip(1), yaf(1), http://docs.python.org/
426
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
silk-plugin
Creating a SiLK run-time plug-in using C
SYNOPSIS
sk_cc=�silk_config --compiler�
sk_cflags=�silk_config --cflags�
$sk_cc $sk_cflags -shared -o FILENAME.so FILENAME.c
rwfilter --plugin=FILENAME.so [--plugin=FILENAME.so ...] ...
rwcut --plugin=FILENAME.so [--plugin=FILENAME.so ...]
--fields=FIELDS ...
rwgroup --plugin=FILENAME.so [--plugin=FILENAME.so ...]
--id-fields=FIELDS ...
rwsort --plugin=FILENAME.so [--plugin=FILENAME.so ...]
--fields=FIELDS ...
rwstats --plugin=FILENAME.so [--plugin=FILENAME.so ...]
--fields=FIELDS --values=VALUES ...
rwuniq --plugin=FILENAME.so [--plugin=FILENAME.so ...]
--fields=FIELDS --values=VALUES ...
DESCRIPTION
Several of the SiLK analysis tools allow the user to augment the tools’ functionality through the use of
plug-ins that get loaded at run-time. These tools are:
rwfilter(1)
Supports adding new switches to determine whether each SiLK Flow record should be written in the
--pass or the --fail output stream.
rwcut(1)
Supports adding new output fields that, when selected using the --fields switch, appear as a column
in the output.
rwsort(1)
Supports adding new key fields that, when selected using the --fields switch, are used to determine
the order in which records are sorted.
rwgroup(1)
Supports adding new key fields that, when selected using the --id-fields switch, are used to determine
how records are grouped.
December 18, 2014
427
silk-plugin(3)
The SiLK Reference Guide
rwuniq(1)
Supports adding new key fields that, when selected using the --fields switch, are used to bin (i.e., group)
the records. In addition, rwuniq supports adding new aggregate value fields that, when selected using
the --values switch, will be computed for each bin. The key and value fields will appear in the output.
rwstats(1)
Supports adding new key fields that, when selected using the --fields switch, are used to bin (i.e.,
group) the records. In addition, rwstats supports adding new aggregate value fields that, when
selected using the --values switch, will be computed for each bin and can be used to determine the
top-N (or bottom-N) bins. The key and value fields will appear in the output for bins that meet the
top-N threshold.
rwptoflow(1)
Supports adding functionality to ignore packets in the pcap(3) input stream or to modify the SiLK
Flow records as the records are generated.
In addition, all of the above tools support adding new command line switches that can be used to initialize
the plug-in itself (for example, to load an auxiliary file that the plug-in requires).
The plug-ins for all tools except rwptoflow can be written in either C or using PySiLK (the SiLK Python
extension, see pysilk(3)). Although the execution time for PySiLK plug-ins is slower than for C plug-ins,
we encourage you to use PySiLK for your plug-ins since the time-to-result can be faster for PySiLK: The
faster development time in Python typically more than compensates for the slower execution time. Once you
find that your PySiLK plug-in is seeing a great deal of use, or that PySiLK is just too slow for the amount
of data you are processing, then re-write the plug-in using C. Even when you intend to write a plug-in using
C, it can be helpful to prototype your plug-in using PySiLK.
The remainder of this document explains how to create a plug-in for the SiLK analysis tools (except rwptoflow) using the C programming language. For information on creating a plug-in using PySiLK, see
silkpython(3).
A template file for plug-ins is included in the SiLK source tree, in the silk-VERSION/src/template/c-plugin.c
file.
The setup function
When you provide --plugin=my-plugin.so on the command line to an application, the application loads
the my-plugin.so file and calls a setup function in that file to determine the new switches and/or fields
that my-plugin.so provides.
This setup function is called with three arguments: the first two describe the version of the plug-in API, and
the third is a pointer that is currently unused.
skplugin_err_t SKPLUGIN_SETUP_FN(
uint16_t
major_version,
uint16_t
minor_version,
void
*plug_in_data)
{
...
}
428
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
There are several tasks this setup function may do: (1) check the API version, (2) register new command
line switches (if any), (3) register new filters (if any), and (4) register new fields (if any). Let’s describe these
in more detail.
(1) Check the API version
The setup function should ensure that the plug-in and the application agree on the API to use. This provides
protection in case the SiLK API to plug-ins changes in the future. To make this determination, call the
skpinSimpleCheckVersion() function. A typical invocation is shown here, where the major version
and minor version were passed into the SKPLUGIN SETUP FN, and PLUGIN API VERSION MAJOR and
PLUGIN API VERSION MINOR are macros defined in the template file to the current version of the API.
#define PLUGIN_API_VERSION_MAJOR 1
#define PLUGIN_API_VERSION_MINOR 0
/* Check the plug-in API version */
rv = skpinSimpleCheckVersion(major_version, minor_version,
PLUGIN_API_VERSION_MAJOR,
PLUGIN_API_VERSION_MINOR,
skAppPrintErr);
if (rv != SKPLUGIN_OK) {
return rv;
}
(2) Register command line switches
If the plug-in wants to define new command line switches, those switches must be registered in the setup
function. A typical use of a command line switch is to allow the user to configure the plug-in; for example,
the switch may allow the user to specify the location of an auxiliary input file that the plug-in requires, or
to set a parameter used by the plug-in.
A second use for a command line switch is more subtle. When creating a plug-in for rwfilter, you may want
your plug-in to provide several similar features, and only enable each feature when the user requests it via a
command line switch. For this case, you want to delay registering the filter until the command line switch
is seen, in which case the filter registration function should be invoked in the switch’s callback function.
Information on registering a command line switch is available below (Registering command line switches).
(3) Register filters
You only need to register filters when the plug-in will be used by rwfilter(1). You may choose to register
the filters in the setup function; if you do, the filter will always be used when the plug-in is loaded by
rwfilter. If you the plug-in provides several filtering functions that the user may choose from via command
line switches, you should call the filter registration function in the callback function for the command line
switch.
See Registering filter functions for details on registering a function to use with rwfilter.
(4) Register fields
If you want your plug-in to create a new printable field for rwcut(1), a new sorting field for rwsort(1), a
new grouping field for rwgroup(1), rwstats(1), or rwuniq(1), or a new aggregate value field for rwstats
or rwuniq, you should register those fields in the setup function. (While you can register the fields in a
switch’s callback function, there is usually little reason to do so.)
There are two interfaces to registering a new field:
December 18, 2014
429
silk-plugin(3)
The SiLK Reference Guide
1. The advanced interface provides complete control over how the field is defined, and allows (or forces)
you to specify exactly how to map from a SiLK Flow record to a binary representation to a textual
representation. To use the advanced interface you will need to define several functions and fill in a C
structure with pointers to those functions. This interface is described in the Advanced field registration
function section below.
2. The simple interface can be used to define fields that map to an integer value, an IP address, or text
that is index by an integer value. To use this interface, you need to define only one or two functions.
The simple interface should handle many common cases, and it is described in Simple field registration
functions.
Registering command line switches
When you register a switch, the two important pieces of information you must provide are a name for the
switch and a callback function. When the application encounters the command line switch registered by
your plug-in, the application will invoke the callback function with the parameter that the user provided (if
any) to the command line switch.
To register a command line switch, call the skpinRegOption2() function:
skplugin_err_t skpinRegOption2(
const char
*option_name,
skplugin_arg_mode_t
mode,
const char
*option_help_string,
skplugin_help_fn_t
option_help_fn,
skplugin_option_fn_t opt_process_fn,
void
*opt_callback_data,
int
num_fn_mask,
...); /* list of skplugin_fn_mask_t */
The parameters are
option name
Specifies the command line switch to create. Do not include the leading -- characters in the name.
mode
Determines whether the switch takes an argument. It should be one of
NO ARG
when the command line option acts as an on/off switch
OPTIONAL ARG
when the command line option has a default value, or
REQUIRED ARG
when the user of the plug-in must provide an argument to the command line option.
option help string
This parameter specifies the usage string to print when the user requests --help from the application.
This parameter may be NULL. Alternatively, you may instruct the application to generate a help string
by invoking a callback function your plug-in provides, as described next.
430
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
option help fn
This parameter specifies a pointer to a function that the application will to call to print a help message
for the command line switch when the user requests --help from the application. This parameter may
be NULL; if it is not NULL, the option help string value is ignored. The signature of the function
to provide is
void option_help_fn(
FILE
*file_handle,
const struct option *option,
void
*opt_callback_data);
The
The file handle argument is where the function should print its help message.
opt callback data is the value provided to skpinRegOption2() when the option was registered.
The struct option parameter has two members of interest: name contains the number used to register the option, and has arg contains the mode that was used when the option was specified.
opt process fn
Specifies the callback function, whose signature is
skplugin_err_t opt_process_fn(
const char *opt_arg,
void
*opt_callback_data);
The application will call opt process fn(opt arg,opt callback data) when --option name is seen
as a command line argument. opt arg will be the parameter the user passed to the switch, or it will
be NULL if no parameter was given.
opt callback data
Will be passed back unchanged to the plug-in as a parameter in the opt process fn() and option help fn() callback functions.
num fn mask
Specifies the number of skplugin fn mask t values specified as the final argument(s) to skpinRegOption2().
...
Specifies a list of skplugin fn mask t values. The length of this list must be specified in the
num fn mask parameter. A plug-in file (e.g., my-plugin.so) can be loaded into any SiLK tool that
supports plug-ins, but you may want a command line switch to appear only in certain applications.
For example, the flowrate(3) plug-in can be used in both rwfilter and rwcut. When used by rwfilter, flowrate provides a --bytes-per-second switch; when used by rwcut, that switch is not
available, and instead the bytes/sec field becomes available. This list determines in which applications the switch gets defined, and the list should contain the SKPLUGIN FN * or SKPLUGIN APP * macros
defined in skplugin.h. To make the switch available in all applications, specify SKPLUGIN FN ANY. When
skpinRegOption2() is called in an the application that does not match a value in this list, the function returns SKPLUGIN ERR DID NOT REGISTER, indicating that this option is not applicable to
the application.
December 18, 2014
431
silk-plugin(3)
The SiLK Reference Guide
Registering filter functions
When you register a filter function, you are specifying a function that rwfilter will call for every SiLK Flow
record that rwfilter reads from its input files. If the function returns SKPLUGIN FILTER PASS, rwfilter
writes the record into the stream(s) specified by --pass. The record goes to the --fail streams if the function
returns SKPLUGIN FILTER FAIL.
(The previous paragraph is true only when the plug-in is the only filtering predicate. When multiple tests
are specified on the rwfilter command line, rwfilter will put the record into the fail destination as soon as
any test fails. If there are multiple tests, your plug-in function will only see records that have not yet failed
a test. If a plug-in filter function follows your function, it may fail a record that your filter function passed.)
To register a filter function, call the following function:
skplugin_err_t skpinRegFilter(
skplugin_filter_t
**filter_handle,
const skplugin_callbacks_t *regdata,
void
*cbdata);
filter handle
When this parameter is not NULL, skpinRegFilter() will set the location it references to the newly
created filter. Currently, no other function accepts the skplugin filter t as an argument.
cbdata
This parameter will be passed back unchanged to the plug-in as a parameter in the various callback
functions. It may be NULL.
regdata
This structure has a member for every possible callback function the SiLK plug-in API supports. When
used by skpinRegFilter(), the following members are supported.
filter
rwfilter invokes this function for each SiLK flow record. If the function returns SKPLUGIN FILTER PASS, the record is accepted; if it returns SKPLUGIN FILTER FAIL, the record
is rejected. The type of the function is a skplugin filter fn t, and its signature is:
skplugin_err_t filter(
const rwRec *rec,
void
*cbdata,
void
**extra);
where rec is the SiLK Flow record, cbdata is the cbdata specified in skpinRegFilter(), and
extra will likely be unused.
init
rwfilter invokes this function for all registered filter predicates. It is called after argument
processing and before reading records. The function’s type is skplugin callback fn t and the
function pointer may be NULL. The callback’s signature is
skplugin_err_t init(
void *cbdata);
cleanup
When this function pointer is non-NULL, rwfilter calls this function after all records have been
processed. This function has the same type and signature as the init function.
432
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
The function’s return value will be SKPLUGIN OK unless the filter member of the regdata structure is
NULL.
If your plug-in registers a filter function and the plug-in is used in an application other that rwfilter, the
call to skpinRegFilter() is a no-op.
Simple field registration functions
Using a plug-in, you can augment the keys available in the --fields switch on rwcut(1), rwgroup(1),
rwsort(1), rwstats(1), and rwuniq(1), and provide new aggregate value fields for the --values switch on
rwstats and rwuniq.
The standard field registration function, skpinRegField(), is powerful---for example, you can control exactly how the value you compute will be printed. However, that power comes with complexity. Many times,
all your plug-in needs to do is to compute a value, and having to write a function to print a number is work
with little reward. The functions in this section handle the registration of common field types.
All of these functions require a name for the new field. The name is used as one of the arguments to the
--fields or --values switch, and the name will also be used as the title when the field is printed (as in
rwcut). Field names are case insensitive, and all field names must be unique within an application. You will
get a run-time error if you attempt to create a field whose name already exists. (In rwuniq and rwstats,
you may have a --fields key and a --values aggregate value with the same name.)
The callback functions dealing with integers use uint64 t for convenience, but internally the value will be
stored in a smaller integer field if possible. Specifying the max parameter to the largest value you actually
use may allow SiLK to use a smaller integer field.
The functions in this section return SKPLUGIN OK unless the callback function is NULL.
Integer key field
The following function is used to register a key field whose value is an unsigned 64 bit integer.
skplugin_err_t skpinRegIntField(
const char
*name,
uint64_t
min,
uint64_t
max,
skplugin_int_field_fn_t rec_to_int,
size_t
width);
name
The name of the new key field.
min
A number representing the minimum integer value for the field.
max
A number representing the maximum integer value for the field. If max is 0, a value of UINT64 MAX
is used instead.
rec to int
A callback function that accepts a SiLK Flow record as its sole argument, and returns an unsigned
integer (in host byte order) which represents the value of the name field for the given record. The
signature is
December 18, 2014
433
silk-plugin(3)
The SiLK Reference Guide
uint64_t rec_to_int(
const rwRec *rec);
width
The column width to use when displaying the field. If width is 0, it will be computed to be the number
of digits necessary to display the integer max.
IPv4 key field
The following function registers a new key field whose value is an IPv4 address.
skplugin_err_t skpinRegIPv4Field(
const char
*name,
skplugin_ipv4_field_fn_t rec_to_ipv4,
size_t
width);
name
The name of the new key field.
rec to ipv4
A callback function that accepts a SiLK Flow record as its sole argument, and returns a 32 bit integer
(in host byte order) which represents the IPv4 addresses for the name field for the given record. The
signature is
uint32_t rec_to_ipv4(
const rwRec *rec);
width
The column width to use when displaying the field. If width is 0, it will be set to 15.
IP key field
The following function is used to register a key field whose value is any IP address (an skipaddr t).
skplugin_err_t skpinRegIPAddressField(
const char
*name,
skplugin_ip_field_fn_t rec_to_ipaddr,
size_t
width);
name
The name of the new key field.
rec to ipaddr
A callback function that accepts a SiLK Flow record and an skipaddr t as arguments. The function
should fill in the IP address as required for the name field. The signature is
void rec_to_ipaddr(
skipaddr_t *dest,
const rwRec *rec);
434
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
width
The column width to use when displaying the field. If width is 0, it will be set to 39 when SiLK has
support for IPv6 addresses, or 15 otherwise.
Text key field (from an integer)
The following function is used to register a key field whose value is an unsigned 64 bit integer (similar to
skpinRegIntField()), but where the printed representation of the field is determined by a second callback
function. This allows the plug-in to create arbitrary text for the field.
skplugin_err_t skpinRegTextField(
const char
*name,
uint64_t
min,
uint64_t
max,
skplugin_int_field_fn_t
value_fn,
skplugin_text_field_fn_t text_fn,
size_t
width);
name
The name of the new key field.
min
A number representing the minimum integer value for the field.
max
A number representing the maximum integer value for the field. If max is 0, a value of UINT64 MAX
is used instead.
value fn
A callback function that accepts a SiLK Flow record as its sole argument, and returns an unsigned
integer (in host byte order) which represents the value of the name field for the given record. The
signature is
uint64_t rec_to_int(
const rwRec *rec);
text fn
A callback function that provides the textual representation of the value returned by value fn. The
function’s signature is
void text_fn(
char
*dest,
size_t
dest_len,
uint64_t val);
The callback should fill the character array dest with the printable representation of val. The number
of characters in dest is given by dest len. Note that dest len may be different than the parameter
width passed to skpinRegTextField(), and text fn must NUL-terminate the string.
width
The column width to use when displaying the field.
December 18, 2014
435
silk-plugin(3)
The SiLK Reference Guide
Text key field (from a list)
The following function is used to register a field whose value is one of a list of strings. The plug-in provides
the list of strings and a callback that takes a SiLK Flow record and returns an index into the list of strings.
skplugin_err_t skpinRegStringListField(
const char
*name,
const char
**list,
size_t
entries,
const char
*default_value,
skplugin_int_field_fn_t
rec_to_index,
size_t
width);
name
The name of the new key field.
list
List is the list of strings. The list should either be NULL terminated, or entries should have a
non-zero value.
entries
The number of entries in list. If entries is 0, SiLK determines the number of entries by traversing
list until it finds a element whose value is NULL.
default value
The value to use when rec to index returns an invalid value.
rec to index
A callback function that accepts a SiLK Flow record as its sole argument, and returns an unsigned
integer (in host byte order) which represents an index into list. If the return value is beyond the end
of list, default value will be used instead. The signature of this callback function is
uint64_t rec_to_int(
const rwRec *rec);
width
The column width to use when displaying the field. If width is 0, it is defaulted to the width of the
longest string in list and default value.
Integer sum aggregate value field
The following function registers an aggregate value field that maintains a running unsigned integer sum.
That is, the values returned by the callback are summed for every SiLK Flow record that matches a bin’s
key. The sum is printed when the bin is printed.
skplugin_err_t skpinRegIntSumAggregator(
const char
*name,
uint64_t
max,
skplugin_int_field_fn_t rec_to_int,
size_t
width);
436
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
name
The name of the new aggregate value field.
max
A number representing the maximum integer value for the field. If max is 0, a value of UINT64 MAX
is used instead.
rec to int
A callback function that accepts a SiLK Flow record as its sole argument, and returns an unsigned
integer (in host byte order) which represents the value of the name value field for the given record. The
signature is
uint64_t rec_to_int(
const rwRec *rec);
width
The column width to use when displaying the value. If width is 0, it will be computed to be the
number of digits necessary to display the integer max.
Integer minimum or maximum aggregate value field
The following function registers an aggregate value field that maintains the minimum integer value seen
among all values returned by the callback function.
skplugin_err_t skpinRegIntMinAggregator(
const char
*name,
uint64_t
max,
skplugin_int_field_fn_t rec_to_int,
size_t
width);
This function is similar, except it maintains the maximum value.
skplugin_err_t skpinRegIntMaxAggregator(
const char
*name,
uint64_t
max,
skplugin_int_field_fn_t rec_to_int,
size_t
width);
name
The name of the new aggregate value field.
max
A number representing the maximum integer value for the field. If max is 0, a value of UINT64 MAX
is used instead.
rec to int
A callback function that accepts a SiLK Flow record as its sole argument, and returns an unsigned
integer (in host byte order) which represents the value of the name value field for the given record. The
signature is
December 18, 2014
437
silk-plugin(3)
The SiLK Reference Guide
uint64_t rec_to_int(
const rwRec *rec);
width
The column width to use when displaying the value. If width is 0, it will be computed to be the
number of digits necessary to display the integer max.
Unsigned integer aggregate value field
The following function registers an aggregate value field that can be represented by a 64 bit integer. The
plug-in must register two callback functions. The first takes a SiLK Flow record and returns an integer
value; the second takes two integer values (as returned by the first callback function) and combines them to
form a new aggregate value.
skplugin_err_t skpinRegIntAggregator(
const char
*name,
uint64_t
max,
skplugin_int_field_fn_t rec_to_int,
skplugin_agg_fn_t
agg,
uint64_t
initial,
size_t
width);
name
The name of the new aggregate value field.
max
A number representing the maximum integer value for the field. If max is 0, a value of UINT64 MAX
is used instead.
rec to int
A callback function that accepts a SiLK Flow record as its sole argument, and returns an unsigned
integer (in host byte order) which represents the value of the name value field for the given record. The
signature is
uint64_t rec_to_int(
const rwRec *rec);
agg
A callback function that combines (aggregates) two values. For example, if you wanted to create a new
aggregate value that contained a bit-wise OR of the TCP flags seen on every packet, your agg function
would OR the values. The signature is
uint64_t agg(
uint64_t current,
uint64_t operand);
initial
Specifies the initial value for the aggregate value. The first time the agg function is called on a bin,
operand will be the value returned by rec to int, and current will be the value given in initial.
The value in initial must be less than or equal to the value in max.
width
The column width to use when displaying the value. If width is 0, it will be computed to be the
number of digits necessary to display the integer max.
438
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
Advanced field registration function
When the simple field registration functions do not provide what you need, you can use the skpinRegField()
function that gives you complete control over the field.
skpinRegField() registers a new derived field for record processing. The plug-in must supply the name of
the new field. The name is used as one of the arguments to the --fields switch (for key fields) or --values
switch (for aggregate value fields). Field names are case insensitive, and all field names must be unique
within an application. You will get a run-time error if you attempt to create a field whose name already
exists. (In rwuniq and rwstats, you may have a --fields key and a --values aggregate value with the same
name.)
The skpinRegField() function requires you initialize and pass in a structure. In this structure you will
specify the callback functions that the application will call, as well as additional information required by
some applications. Although the structure is complex, not all applications use all members.
If the plug-in is loaded by an application that does not support fields (such as rwfilter), the function is a
no-op.
The advanced field registration function is
skplugin_err_t skpinRegField(
skplugin_field_t
**return_field,
const char
*name,
const char
*description,
const skplugin_callbacks_t *regdata,
void
*cbdata);
return field
When this value is not NULL, skpinRegField() will set the location it references to the newly created
field.
name
This sets the primary name of the field, and by default will be the title used when printing the field.
description
The description provides a textual description of the field. Currently this is unused.
regdata
The regdata structure provides the application with the callback functions and additional information
it needs to use the plug-in. The members that must be set vary by application. It is described in more
detail below.
cbdata
This parameter will be passed back unchanged to the plug-in as a parameter in the various callback
functions. It may be NULL.
The structure used by the skpinRegField() (and skpinRegFilter()) functions to specify callback functions
is shown here:
typedef struct skplugin_callbacks_st {
skplugin_callback_fn_t
init;
skplugin_callback_fn_t
cleanup;
December 18, 2014
439
silk-plugin(3)
The SiLK Reference Guide
size_t
column_width;
size_t
bin_bytes;
skplugin_text_fn_t
rec_to_text;
skplugin_bin_fn_t
rec_to_bin;
skplugin_bin_fn_t
add_rec_to_bin;
skplugin_bin_to_text_fn_t bin_to_text;
skplugin_bin_merge_fn_t
bin_merge;
skplugin_bin_cmp_fn_t
bin_compare;
skplugin_filter_fn_t
filter;
skplugin_transform_fn_t
transform;
const uint8_t
*initial;
const char
**extra;
} skplugin_callbacks_t;
All of the callback functions reference in this structure take cbdata as a parameter, which is the value that
was specified in the call to skpinRegField(). The extra parameter to the callback functions is used in
complex plug-ins and can be ignored.
The members of the structure are:
init
This specifies a callback function which the application will call when it has determined this field
will be used. (In the case of skpinRegFilter(), the function is called for all registered filters.) The
application calls the function before processing data. It may be NULL; the signature of the callback
function is
skplugin_err_t init(
void *cbdata);
cleanup
When this callback function is not NULL, the application will call it after all records have been
processed. It has the same signature as the init function.
column width
The number of characters (not including trailing NUL) required to hold a string representation of the
longest value of the field. This value can be 0 if not used (e.g., rwsort does not print fields), or if it
will be set later using skpinSetFieldWidths().
bin bytes
The number of bytes (octets) required to hold a binary representation of a value of the field. This
value can be 0 if not used (e.g., rwcut does not use binary values), or if it will be set later using
skpinSetFieldWidths().
rec to text
The application uses this callback function to fetch the textual value for the field given a SiLK Flow
record. The signature of this function is
skplugin_err_t rec_to_text(
const rwRec *rec,
char
*dest,
size_t
width,
void
*cbdata,
void
**extra);
440
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
The callback function should fill the character array dest with the textual value, and the value should
be NUL-terminated. width specifies the overall size of dest, and it may not have the same value as
specified by the column width member. For proper formatting, the callback function should write no
more than column width characters into dest. Note that if an application requires a rec to bin function and rec to bin is NULL, the application will use rec to text if it is provided. The application
will use column width as the width for binary values (zeroing out the destination area before it is
written to).
rec to bin
This callback function is used by the application to fetch the binary value for this field given the SiLK
Flow record. The signature of this function is:
skplugin_err_t rec_to_bin(
const rwRec *rec,
uint8_t
*dest,
void
*cbdata,
void
**extra);
The callback function should write exactly bin bytes of data into dest (where bin bytes was specified
in the call to skpinRegField() or skpinSetFieldWidths()). See also the rec to text member.
add rec to bin
This callback function is used by rwuniq and rwstats when computing aggregate value fields. The
application expects this function to get the binary value for this field from the SiLK Flow record
and merge it (e.g., add it) to the current value. That is, the function should update the value in
current and new value with the value that comes from the current rec. The signature is:
skplugin_err_t add_rec_to_bin(
const rwRec *rec,
uint8_t
*current_and_new_value,
void
*cbdata,
void
**extra);
The callback function should write exactly bin bytes of data into current and new value.
bin to text
This callback function is used to get a textual representation of a binary value that was set by a prior
call to the rec to bin or add rec to bin functions. The function signature is
skplugin_err_t bin_to_text(
const uint8_t *bin,
char
*dest,
size_t
width,
void
*cbdata);
The binary input value is in bin, and it is exactly bin bytes in length. The textual output must
be written to dest. The overall size of dest is given by width, which may be different than the
column width value that was previously specified. For proper formatting, the callback function should
write no more than column width characters into dest.
December 18, 2014
441
silk-plugin(3)
The SiLK Reference Guide
bin merge
When rwstats and rwuniq are unable to store all values in memory, the applications write their
current state to temporary files on disk. Once all input data has been processed, the temporary files
are combined to produce the output. When a key appears in multiple temporary files, the aggregate
values must be merged (for example, the byte count for two keys would be added). This callback
function is used to merge aggregate value fields defined by the plug-in. The function signature is
below. The src1 and dest parameter will contain a binary aggregate value from one of the files, and
the src2 parameter a value from the other. These should be combined and the (binary) result written
to src1 and dest. The byte length of both parameters is bin bytes.
skplugin_err_t bin_merge(
uint8_t
*src1_and_dest,
const uint8_t *src2,
void
*cbdata);
bin compare
This callback function is used by rwstats when determining the top-N (or bottom-N) bins based on
the binary aggregate values. The function accepts two binary values, value a and value b, each of
length bin bytes. The function must set cmp result to an integer less than 0, equal 0, or greater
than 0 to indicate whether value a is less than, equal to, or greater than value b, respectively. If this
function is NULL, memcmp() will be used on the binary values instead.
skplugin_err_t bin_compare(
int
*cmp_result,
const uint8_t *value_a,
const uint8_t *value_b,
void
*cbdata);
filter
This callback function is only required when the plug-in will be used by rwfilter, as described above.
When defining a field, filter is ignored.
transform
This callback function is only required when the plug-in will be used by rwptoflow. This callback
allows the plug-in to modify the SiLK Flow record, rec, before it is written to the output. The callback
function should modify rec in place; the signature is
skplugin_err_t transform(
rwRec *rec,
void
*cbdata,
void **extra);
initial
When the initial member is not NULL, it should point to a value containing at least bin bytes
bytes. These bytes will be used to initialize the binary aggregate value. As an example use case, when
the plug-in is computing a minimum, it may choose to initialize the field to contain the maximum
value. When initial is NULL, binary aggregate values are initialized using bzero().
extra
This member is usually NULL. When not NULL, it points to a NULL-terminated constant array of
strings representing ”extra arguments”. These are not often used, and they will not be discussed in
this manual page.
442
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
Once a field is registered, you may make changes to it by calling the additional functions described below.
In each of these functions, the field parameter is the handle returned when the field was registered.
By default, the name will also be used as the field’s title. To specify a different title, the plug-in may call
skplugin_err_t skpinSetFieldTitle(
skplugin_field_t
field,
const char
title);
To create an alternate name for the field (that is, a name that can be used in the --fields or --values
switches) call
skplugin_err_t skpinAddFieldAlias(
skplugin_field_t
field,
const char
alias);
To set or modify the textual and binary widths for a field, use the following function. This function should
called in the field’s init callback function.
skplugin_err_t skpinSetFieldWidths(
skplugin_field_t
field,
size_t
field_width_text,
size_t
field_width_bin);
The following table shows when a member of the skplugin callbacks t structure is required or optional.
(Where the table shows column width and bin bytes as required, the values can be set in the structure or
via the skpinSetFieldWidths() function.)
init
cleanup
column_width
bin_bytes
rec_to_text
rec_to_bin
add_rec_to_bin
bin_to_text
bin_merge
bin_compare
initial
filter
transform
extra
rwfilter rwcut rwgroup rwsort rwstats rwuniq rwptoflow
r
f
f
f
f,a
f,a
r
r
f
f
f
f,a
f,a
r
.
F
.
.
F,A
F,A
.
.
.
F
F
F,A
F,A
.
.
F
.
.
.
.
.
.
.
F
F
F
F
.
.
.
.
.
A
A
.
.
.
.
.
F,A
F,A
.
.
.
.
.
A
A
.
.
.
.
.
A
.
.
.
.
.
.
a
a
.
R
.
.
.
.
.
.
.
.
.
.
.
.
R
r
f
f
f
f,a
f,a
r
The legend is
F
required for a key field
A
required for an aggregate value field
December 18, 2014
443
silk-plugin(3)
The SiLK Reference Guide
R
required for a non-field application (e.g., rwfilter)
f
optional for a key field
a
optional for an aggregate value field
r
optional for a non-field application
.
ignored
Miscellaneous functions
The following registers a cleanup function for the plug-in. This function will be called by the application
after any field- or filter-specific cleanup functions are called. Specifically, this is the last callback that the
application will invoke on a plug-in.
skplugin_err_t skpinRegCleanup(
skplugin_cleanup_fn_t cleanup);
The signature of the cleanup function is:
void cleanup(void);
The plug-in author should invoke the following function to tell rwfilter that this plug-in is not thread safe.
Calling this function causes rwfilter not use multiple threads; as such, this function should only be called
when the plug-in has registered an active filter function.
void skpinSetThreadNonSafe(void);
Compiling the plug-in
Once you have finished writing the C code for the plug-in, save it in a file. The following uses the name
my-plugin.c for the name of this file.
In the following, the leading dollar sign ($) followed by a space represents the shell prompt. The text after
the dollar sign represents the command line. Lines have been wrapped for improved readability, and the
back slash (\) is used to indicate a wrapped line.
When compiling a plug-in, you should use the same compiler and compiler-options as when SiLK was
compiled. The silk config(1) utility can be used to obtain that information. To store the compiler used to
compile SiLK into the variable sk cc, specify the following at a shell prompt (note that those are backquotes,
and this assumes a Bourne-compatible shell):
$ sk_cc=�silk_config --compiler�
444
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
To get the compiler flags used to compile SiLK:
$ sk_cflags=�silk_config --cflags�
Using those two variables, you can now compile the plug-in. The following will work on Linux and Mac OS
X:
$ $sk_cc $sk_cflags -shared -o my-plugin.so my-plugin.c
For Mac OS X:
$ $sk_cc $sk_cflags -bundle -flat_namespace -undefined suppress
-o my-plugin.so my-plugin.c
\
If there are compilation errors, fix them and compile again.
Notes: The preceding assumed you were building the plug-in after having installed SiLK. The paths given
by silk config do not work if SiLK has not been installed. To compile the plug-in, you must have access to
the SiLK header files. (If you are using an RPM installation of SiLK, ensure that the silk-devel RPM is
installed.)
Once you have created the my-plugin.so file, you can load it into an application by using the --plugin switch
on the application as shown in the SYNOPSIS. When loading a plug-in from the current directly, it is best
to prefix the filename with ./:
$ rwcut --plugin=./my-plugin.so ...
If there are problems loading the plug-in into the application, you can trace the actions the application is
doing by setting the SILK PLUGIN DEBUG environment variable:
$ SILK_PLUGIN_DEBUG=1
rwcut --plugin=./my-plugin.so ...
EXAMPLES
rwfilter
Suppose you want to find traffic destined to a particular host, 10.0.0.23, that is either ICMP or coming from
1434/udp. If you attempt to use:
$ rwfilter --daddr=10.0.0.23 --proto=1,17 --sport=1434
--pass=outfile.rw flowrec.rw
\
the --sport option will not match any of the ICMP traffic, and your result will not contain ICMP records.
To avoid having to use two invocations of rwfilter, you can create the following plug-in to do the entire
check in a single pass:
#include
#include
#include
#include
#include
<silk/silk.h>
<silk/rwrec.h>
<silk/skipaddr.h>
<silk/skplugin.h>
<silk/utils.h>
December 18, 2014
445
silk-plugin(3)
The SiLK Reference Guide
/* These variables specify the version of the SiLK plug-in API. */
#define PLUGIN_API_VERSION_MAJOR 1
#define PLUGIN_API_VERSION_MINOR 0
/* ip to search for */
static skipaddr_t ipaddr;
/*
* status = filter(rwrec, reg_data, extra);
*
*
The function should examine the SiLK flow record and return
*
SKPLUGIN_FILTER_PASS to write the rwRec to the
*
pass-destination(s) or SKPLUGIN_FILTER_FAIL to write it to the
*
fail-destination(s).
*/
static skplugin_err_t filter(
const rwRec
*rwrec,
void
*reg_data,
void
**extra)
{
skipaddr_t dip;
rwRecMemGetDIP(rwrec, &dip);
if (0 == skipaddrCompare(&dip, &ipaddr)
&& (rwRecGetProto(rwrec) == 1
|| (rwRecGetProto(rwrec) == 17
&& rwRecGetSPort(rwrec) == 1434)))
{
return SKPLUGIN_FILTER_PASS;
}
return SKPLUGIN_FILTER_FAIL;
}
/* The set-up function that the application will call. */
skplugin_err_t SKPLUGIN_SETUP_FN(
uint16_t
major_version,
uint16_t
minor_version,
void
*plug_in_data)
{
uint32_t ipv4;
skplugin_err_t rv;
skplugin_callbacks_t regdata;
/* Check the plug-in API version */
rv = skpinSimpleCheckVersion(major_version, minor_version,
PLUGIN_API_VERSION_MAJOR,
PLUGIN_API_VERSION_MINOR,
skAppPrintErr);
if (rv != SKPLUGIN_OK) {
return rv;
}
446
December 18, 2014
The SiLK Reference Guide
silk-plugin(3)
/* set global ipaddr */
ipv4 = ((10 << 24) | 23);
skipaddrSetV4(&ipaddr, &ipv4);
/* register the filter */
memset(&regdata, 0, sizeof(regdata));
regdata.filter = filter;
return skpinRegFilter(NULL, &regdata, NULL);
}
Once this file is created and compiled, you can use it from rwfilter as shown here:
$ rwfilter --plugin=./my-plugin.so --pass=outfile.rw
flowrec.rw
Additional examples
For additional examples, see the source files in silk-VERSION/src/plugins.
ENVIRONMENT
SILK PATH
This environment variable gives the root of the install tree. When searching for plug-ins, a SiLK
application may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, the SiLK applications print status messages to the standard error as they attempt
to find and open each plug-in. In addition, when an attempt to register a field fails, the application
prints a message specifying the additional function(s) that must be defined to register the field in the
application. Be aware that the output can be rather verbose.
FILES
${SILK PATH}/lib64/silk/
${SILK PATH}/lib64/
${SILK PATH}/lib/silk/
${SILK PATH}/lib/
/usr/local/lib64/silk/
/usr/local/lib64/
/usr/local/lib/silk/
/usr/local/lib/
Directories that a SiLK application checks when attempting to load a plug-in.
SEE ALSO
rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), silk config(1), rwptoflow(1),
pysilk(3), silkpython(3), flowrate(3), silk(7), pcap(3)
December 18, 2014
447
silkpython(3)
The SiLK Reference Guide
silkpython
SiLK Python plug-in (silkpython.so)
SYNOPSIS
rwfilter --python-file=FILENAME [--python-file=FILENAME ...] ...
rwfilter --python-expr=PYTHON_EXPRESSION ...
rwcut --python-file=FILENAME [--python-file=FILENAME ...]
--fields=FIELDS ...
rwgroup --python-file=FILENAME [--python-file=FILENAME ...]
--id-fields=FIELDS ...
rwsort --python-file=FILENAME [--python-file=FILENAME ...]
--fields=FIELDS ...
rwstats --python-file=FILENAME [--python-file=FILENAME ...]
--fields=FIELDS --values=VALUES ...
rwuniq --python-file=FILENAME [--python-file=FILENAME ...]
--fields=FIELDS --values=VALUES ...
DESCRIPTION
The SiLK Python plug-in provides a way to use PySiLK (the SiLK extension for python(1) described in
pysilk(3)) to extend the capability of several SiLK tools.
• In rwfilter(1), new partitioning rules can be defined in PySiLK to determine whether a SiLK Flow
record is written to the --pass-destination or --fail-destination.
• In rwcut(1), new fields can be defined in PySiLK and displayed for each record.
• New fields can also be defined in rwgroup(1) and rwsort(1). These fields are used as part of the key
when grouping or sorting the records.
• For rwstats(1) and rwuniq(1), two types of fields can be defined: Key fields are used to categorize
the SiLK Flow records into bins, and aggregate value fields compute a value across all the SiLK Flow
records that are categorized into a bin. (An example of a built-in aggregate value field is the number
of packets that were seen for all flow records that match a particular key.)
To extend the SiLK tools using PySiLK, the user writes a Python file that calls Python functions defined in
the silk.plugin Python module and described in this manual page. When the user specifies the --pythonfile switch to a SiLK application, the application loads the Python file and makes the new functionality
available.
The following sections will describe
448
December 18, 2014
The SiLK Reference Guide
silkpython(3)
• how to create a command line switch with PySiLK that allows one to modify the run-time behavior
of their PySiLK code
• how to use PySiLK with rwfilter
• a simple API for creating fields in rwcut, rwgroup, rwsort, rwstats, and rwuniq
• the advanced API for creating fields in those applications
Typically you will not need to explicitly import the silk.plugin module, since the --python-file switch does
this for you. In a module used by a Python plug-in, the module can gain access to the functions defined in
this manual page by importing them from silk.plugin:
from silk.plugin import *
Hint: If you want to check whether the Python code in FILENAME is defining the switches and fields you
expect, you can load the Python file and examine the output of --help, for example:
rwcut --python-file=FILENAME --help
User-defined command line switches
Command line switches can be added and handled from within a SiLK Python plug-in. In order to add a
new switch, use the following function:
register switch(switch name, handler=handler func, [arg=needs arg], [help=help string])
switch name
Provides the name of the switch you are registering, a string. Do not include the leading -- in the
name. If a switch already exists with the name switch name, the application will exit with an error
message.
handler func
handler func([string] ). Names a function that will be called by the application while it is processing
its command line if and only if the command line includes the switch --switch name. (If the switch
is not given, the handler func function will not be called.) When the arg parameter is specified
and its value is False, the handler func function will be called with no arguments. Otherwise, the
handler func function will be called with a single argument: a string representing the value the user
passed to the --switch name switch. The return value from this function is ignored. Note that the
register switch() function requires a handler argument which must be passed by keyword.
needs arg
Specifies a boolean value that determines whether the user must specify an argument to -switch name, and determines whether the handler func function should expect an argument. When
arg is not specified or needs arg is True, the user must specify an argument to --switch name and
the handler func function will be called with a single argument. When needs arg is False, it is an error
to specify an argument to --switch name and handler func will be called with no arguments.
help string
Provides the usage text to print describing this switch when the user runs the application with the
--help switch. This argument is optional; when it is not provided, a simple ”No help for this switch”
message is printed.
December 18, 2014
449
silkpython(3)
The SiLK Reference Guide
rwfilter usage
When used in conjunction with rwfilter(1), the SiLK Python plug-in allows users to define arbitrary
partitioning criteria using the SiLK extension to the Python programming language. To use this capability,
the user creates a Python file and specifies its name with the --python-file switch in rwfilter. The file
should call the register filter() function for each filter that it wants to create:
register filter(filter func, [finalize=finalize func], [initialize=initialize func])
filter func
Boolean = filter func(silk.RWRec). Names a function that must accept a single argument, a
silk.RWRec object (see pysilk(3)). When the rwfilter program is run, it finds the records that
match the selection options, and hands each record to the built-in partitioning switches. A record that
passes all of the built-in switches is handed to the first Python filter func() function as an RWRec
object. The return value of the function determines what happens to the record. The record fails the
filter func() function (and the record is immediately written to the --fail-destination, if specified)
when the function returns one of the following: False, None, numeric zero of any type, an empty
string, or an empty container (including strings, tuples, lists, dictionaries, sets, and frozensets). If the
function returns any other value, the record passes the first filter func() function, and the record
is handed to the next Python filter func() function. If all filter func() functions pass the record,
the record is written to the --pass-destination, if specified. (Note that when the --plugin switch is
present, the code it specifies will be called after the PySiLK code.)
initialize func
initialize func(). Names a function that takes no arguments. When this function is specified, is will
be called after rwfilter has completed its argument processing, and just before rwfilter opens the
first input file. The return value of this function is ignored.
finalize func
finalize func(). Names a function that takes no arguments. When this function is specified, it will be
called after all flow records have been processed. One use of the these functions is to print any statistics
that the filter func() function was computing. The return value from this function is ignored.
If register filter() is called multiple times, the filter func(), initialize func(), and finalize func() functions will be invoked in the order in which the register filter() functions were seen.
NOTE: For backwards compatibility, when the file named by --python-file does not call register filter(),
rwfilter will search the Python file for functions named rwfilter() and finalize(). If it finds the rwfilter()
function, rwfilter will act as if the file contained:
register_filter(rwfilter, finalize=finalize)
The --python-file switch requires the user to create a file containing Python code. To allow the user to write
a small filtering check in Python, rwfilter supports the --python-expr switch. The value of the switch
should be a Python expression whose result determines whether a given record passes or fails, using the same
criterion as the filter func() function described above. In the expression, the variable rec is bound to the
current silk.RWRec object. There is no support for the initialize func() and finalize func() functions.
The user may consider --python-expr=PYTHON EXPRESSION as being implemented by
from silk import *
def temp_filter(rec):
return (PYTHON_EXPRESSION)
450
December 18, 2014
The SiLK Reference Guide
silkpython(3)
register_filter(temp_filter)
The --python-file and --python-expr switches allow for much flexibility but at the cost of speed: converting
a SiLK Flow record into an RWRec is expensive relative to most operations in rwfilter. The user should
use rwfilter’s built-in partitioning switches to whittle down the input as much as possible, and only use the
Python code to do what is difficult or impossible to do otherwise.
Simple field registration functions
The silk.plugin module defines a function that can be used to define fields for use in rwcut, rwgroup,
rwsort, rwstats, and rwuniq. That function is powerful, but it is also complex. To make it easy to define
fields for the common cases, the silk.plugin provides the functions described in this section that create a
key field or an aggregate value field. The advanced function is described later in this manual page (Advanced
field registration function).
Once you have created a key field or aggregate value field, you must include the field’s name in the argument
to the --fields or --values switch to tell the application to use the field.
Integer key field
The following function is used to create a key field whose value is an unsigned integer.
register int field(field name, int function, min, max , [width])
field name
The name of the new field, a string. If you attempt to add a key field that already exists, you will get
an an error message.
int function
int = int function(silk.RWRec). A function that accepts a silk.RWRec object as its sole argument,
and returns an unsigned integer which represents the value of this field for the given record.
min
A number representing the minimum integer value for the field. If int function returns a value less
than min, an error is raised.
max
A number representing the maximum integer value for the field. If int function returns a value greater
than max, an error is raised.
width
The column width to use when displaying the field. This parameter is optional; the default is the
number of digits necessary to display the integer max.
IPv4 address key field
This function is used to create a key field whose value is an IPv4 address. (See also register ip field()).
register ipv4 field(field name, ipv4 function, [width])
field name
The name of the new field, a string. If you attempt to add a key field that already exists, you will get
an an error message.
December 18, 2014
451
silkpython(3)
The SiLK Reference Guide
ipv4 function
silk.IPv4Addr = ipv4 function(silk.RWRec). A function that accepts a silk.RWRec object as its
sole argument, and returns a silk.IPv4Addr object. This IPv4Addr object will be the IPv4 address
that represents the value of this field for the given record.
width
The column width to use when displaying the field. This parameter is optional, and it defaults to 15.
IP address key field
The next function is used to create a key field whose value is an IPv4 or IPv6 address.
register ip field(field name, ip function, [width])
field name
The name of the new field, a string. If you attempt to add a key field that already exists, you will get
an an error message.
ip function
silk.IPAddr = ip function(silk.RWRec). A function that accepts a silk.RWRec object as its sole
argument, and returns a silk.IPAddr object which represents the value of this field for the given
record.
width
The column width to use when displaying the field. This parameter is optional. The default width is
39.
This key field requires more memory internally than fields registered by the register ipv4 field() function.
If SiLK is compiled without IPv6 support, register ip field() works exactly like register ipv4 field(),
including the default width of 15.
Enumerated object key field
The following function is used to create a key field whose value is any Python object. The maximum number
of different objects that can be represented is 4,294,967,296, or 2Л†32.
register enum field(field name, enum function, width, [ordering])
field name
The name of the new field, a string. If you attempt to add a key field that already exists, you will get
an an error message.
enum function
object = enum function(silk.RWRec). A function that accepts a silk.RWRec object as its sole
argument, and returns a Python object which represents the value of this field for the given record.
For typical usage, the Python objects returned by the enum function will be strings representing some
categorical value.
width
The column width to use when displaying this field. The parameter is required.
452
December 18, 2014
The SiLK Reference Guide
silkpython(3)
ordering
A list of objects used to determine ordering for rwsort and rwuniq. This parameter is optional. If
specified, it lists the objects in the order in which they should be sorted. If the enum function returns
a object that is not in ordering, the object will be sorted after all the objects in ordering.
Integer sum aggregate value field
This function is used to create an aggregate value field that maintains a running unsigned integer sum.
register int sum aggregator(agg value name, int function, [max sum], [width])
agg value name
The name of the new aggregate value field, a string. The agg value name must be unique among all
aggregate values, but an aggregate value field and key field can have the same name.
int function
int = int function(silk.RWRec). A function that accepts a silk.RWRec object as its sole argument,
and returns an unsigned integer which represents the value that should be added to the running sum
for the current bin.
max sum
The maximum possible sum.
(18,446,744,073,709,551,615).
This parameter is optional; if not specified, the default is 2Л†64-1
width
The column width to use when displaying the aggregate value. This parameter is optional. The default
is the number of digits necessary to display max sum.
Integer maximum aggregate value field
The following function is used to create an aggregate value field that maintains the maximum unsigned
integer value.
register int max aggregator(agg value name, int function, [max max ], [width])
agg value name
The name of the new aggregate value field, a string. The agg value name must be unique among all
aggregate values, but an aggregate value field and key field can have the same name.
int function
int = int function(silk.RWRec). A function that accepts a silk.RWRec object as its sole argument,
and returns an integer which represents the value that should be considered for the current highest
value for the current bin.
max max
The maximum possible value for the maximum. This parameter is optional; if not specified, the default
is 2Л†64-1 (18,446,744,073,709,551,615).
width
The column width to use when displaying the aggregate value. This parameter is optional. The default
is the number of digits necessary to display max max.
December 18, 2014
453
silkpython(3)
The SiLK Reference Guide
Integer minimum aggregate value field
This function is used to create an aggregate value field that maintains the minimum unsigned integer value.
register int min aggregator(agg value name, int function, [max min], [width])
agg value name
The name of the new aggregate value field, a string. The agg value name must be unique among all
aggregate values, but an aggregate value field and key field can have the same name.
int function
int = int function(silk.RWRec). A function that accepts a silk.RWRec object as its sole argument,
and returns an integer which represents the value that should be considered for the current lowest value
for the current bin.
max min
The maximum possible value for the minimum. When this optional parameter is not specified, the
default is 2Л†64-1 (18,446,744,073,709,551,615).
width
The column width to use when displaying the aggregate value. This parameter is optional. The default
is the number of digits necessary to display max min.
Advanced field registration function
The previous section provided functions to register a key field or an aggregate value field when dealing with
common objects. When you need to use a complex object, or you want more control over how the object is
handled in PySiLK, you can use the register field() function described in this section.
Many of the arguments to the register field() function are callback functions that you must create and
that the application will invoke. (The simple registration functions above have already taken care of defining
these callback functions.)
Often the callback functions for handling fields will either take (as a parameter) or return a representation
of a numeric value that can be processed from C. The most efficient way to handle these representations is
as a string containing binary characters, including the null byte. We will use the term ”byte sequence” for
these representations; other possible terms include ”array of bytes”, ”byte strings”, or ”binary values”. For
hints on creating byte sequences from Python, see the Byte sequences section below.
To define a new field or aggregate value, the user calls:
register field(field name, [add rec to bin=add rec to bin func,] [bin compare=bin compare func,]
[bin bytes=bin bytes value,]
[bin merge=bin merge func,]
[bin to text=bin to text func,]
[column width=column width value,] [description=description string,] [initial value=initial value,]
[initialize=initialize func,] [rec to bin=rec to bin func,] [rec to text=rec to text func])
Although the keyword arguments to register field() are all optional from Python’s perspective, certain
keyword arguments must be present before an application will define the key or aggregate value. The
following table summarizes the keyword arguments used by each application. An F means the argument is
required for a key field, an A means the argument is required for an aggregate value field, f and a mean the
application will use the argument for a key field or an aggregate value if the argument is present, and a dot
means the application completely ignores the argument.
454
December 18, 2014
The SiLK Reference Guide
add_rec_to_bin
bin_compare
bin_bytes
bin_merge
bin_to_text
column_width
description
initial_value
initialize
rec_to_bin
rec_to_text
rwcut
.
.
.
.
.
F
f
.
f
.
F
silkpython(3)
rwgroup
.
.
F
.
.
.
f
.
f
F
.
rwsort
.
.
F
.
.
.
f
.
f
F
.
rwstats
A
A
F,A
A
F,A
F,A
f,a
a
f,a
F
.
rwuniq
A
.
F,A
A
F,A
F,A
f,a
a
f,a
F
.
The following sections describe how to use register field() in each application.
rwcut usage
The purpose of rwcut(1) is to print attributes of (or attributes derived from) every SiLK record it reads as
input. A plug-in used by rwcut must produce a printable (textual) attribute from a SiLK record. To define
a new attribute, the register field() method should be called as shown:
register field(field name,
column width=column width value,
[description=description string,] [initialize=initialize func])
rec to text=rec to text func,
field name
Names the field being defined, a string. If you attempt to add a field that already exists, you will get
an an error message. To display the field, include field name in the argument to the --fields switch.
column width value
Specifies the length of the longest printable representation. rwcut will use it as the width for the
field name column when columnar output is selected.
rec to text func
string = rec to text func(silk.RWRec). Names a callback function that takes a silk.RWRec object
as its sole argument and produces a printable representation of the field being defined. The length
of the returned text should not be greater than column width value. If the value returned from this
function is not a string, the returned value is converted to a string by the Python str() function.
description string
Provides a string giving a brief description of the field, suitable for printing in --help-fields output.
This argument is optional.
initialize func
initialize func(). Names a callback function that will be invoked after the application has completed
its argument processing, and just before it opens the first input file. This function is only called when
--fields includes field name. The function takes no arguments and its return value is ignored. This
argument is optional.
If the rec to text argument is not present, the register field() function will do nothing when called from
rwcut. If the column width argument is missing, rwcut will complain that the textual width of the
plug-in field is 0.
December 18, 2014
455
silkpython(3)
The SiLK Reference Guide
rwgroup and rwsort usage
The rwsort(1) tool sorts SiLK records by their attributes or attributes derived from them. rwgroup(1)
reads sorted SiLK records and writes a common value into the next hop IP field of all records that have
common attributes. The output from both of these tools is a stream of SiLK records (the output typically
includes every record that was read as input). A plug-in used by these tools must return a value that the
application can use internally to compare records. To define a new field that may be included in the --idfields switch to rwgroup or the --fields switch to rwsort, the register field() method should be invoked
as follows:
bin bytes=bin bytes value,
register field(field name,
[description=description string,] [initialize=initialize func])
rec to bin=rec to bin func,
field name
Names the field being defined, a string. If you attempt to add a field that already exists, you will get
an an error message. To have rwgroup or rwsort use this field, include field name in the argument
to --id-fields or --fields.
bin bytes value
Specifies a positive integer giving the length, in bytes, of the byte sequence that the rec to bin func()
function produces; the byte sequence must be exactly this length.
rec to bin func
byte-sequence = rec to bin func(silk.RWRec). Names a callback function that takes a silk.RWRec
object and returns a byte sequence that represents the field being defined. The returned value should
be exactly bin bytes value bytes long. For proper grouping or sorting, the byte sequence should be
returned in network byte order (i.e., big endian).
description string
Provides a string giving a brief description of the field, suitable for printing in --help-fields output.
This argument is optional.
initialize func
initialize func(). Names a callback function that will be invoked after the application has completed
its argument processing, and just before it opens the first input file. This function is only called when
field name is included in the list of fields. The function takes no arguments and its return value is
ignored. This argument is optional.
If the rec to bin argument is not present, the register field() function will do nothing when called from
rwgroup or rwsort. If the bin bytes argument is missing, rwgroup or rwsort will complain that the
binary width of the plug-in field is 0.
rwstats and rwuniq usage
rwstats(1) and rwuniq(1) group SiLK records into bins based on key fields. Once a record is matched to
a bin, the record is used to update the aggregate values (e.g., the sum of bytes) that are being computed,
and the record is discarded. Once all records have been processed, the key fields and the aggregate values
are printed.
Key Field
456
December 18, 2014
The SiLK Reference Guide
silkpython(3)
A plug-in used by rwstats or rwuniq for creating a new key field must return a value that the application
can use internally to compare records, and there must be a function that converts that value to a printable
representation. The following invocation of register field() will produce a key field that can be used in the
--fields switch of rwstats or rwuniq:
register field(field name,
bin bytes=bin bytes value,
bin to text=bin to text func,
colrec to bin=rec to bin func,
[description=description string,]
umn width=column width value,
[initialize=initialize func])
The arguments are:
field name
Contains the name of the field being defined, a string. If you attempt to add a field that already
exists, you will get an an error message. The field will only be active when field name is specified as
an argument to --fields.
bin bytes value
Contains a positive integer giving the length, in bytes, of the byte sequence that the rec to bin func()
function produces and that the bin to text func() function accepts. The byte sequences must be
exactly this length.
bin to text func
string = bin to text func(byte-sequence). Names a callback function that takes a byte sequence,
of length bin bytes value, as produced by the rec to bin func() function and returns a printable
representation of the byte sequence. The length of the text should be no longer than the value specified
by column width. If the value returned from this function is not a string, the returned value is
converted to a string by the Python str() function.
column width value
Contains a positive integer specifying the length of the longest textual field that the
bin to text func() callback function returns. This length will used as the column width when columnar output is requested.
rec to bin func
byte-sequence = rec to bin func(silk.RWRec). Names a callback function that takes a silk.RWRec
object and returns a byte sequence that represents the field being defined. The returned value should
be exactly bin bytes value bytes long. For proper sorting, the byte sequence should be returned in
network byte order (i.e., big endian).
description string
Provides a string giving a brief description of the field, suitable for printing in --help-fields output.
This argument is optional.
initialize func
initialize func(). Names a callback function that is called after the command line arguments have
been processed, and before opening the first file. This function is only called when --fields includes
field name. The function takes no arguments and its return value is ignored. This argument is optional.
Aggregate Value
A plug-in used by rwstats or rwuniq for creating a new aggregate value must be able to use a SiLK record
to update an aggregate value, take two aggregate values and merge them to a new value, and convert that
aggregate value to a printable representation. To use an aggregate value for ordering the bins in rwstats, the
December 18, 2014
457
silkpython(3)
The SiLK Reference Guide
plug-in must also define a function to compare two aggregate values. The aggregate values are represented
as byte sequences.
To define a new aggregate value in rwstats, the user calls:
add rec to bin=add rec to bin func,
bin bytes=bin bytes value,
register field(agg value name,
bin to text=bin to text func,
column width=column width value,
bin merge=bin merge func,
[bin compare=bin compare func,]
[description=description string,]
[initial value=initial value,]
[initialize=initialize func])
The call to define a new aggregate value in rwuniq is nearly identical:
register field(agg value name,
add rec to bin=add rec to bin func,
bin bytes=bin bytes value,
bin to text=bin to text func,
column width=column width value,
bin merge=bin merge func,
[description=description string,] [initial value=initial value,] [initialize=initialize func])
The arguments are:
agg value name
Contains the name of the aggregate value field being defined, a string. The name of value must be
unique among all aggregate values, but an aggregate value field and key field can have the same name.
The value will only be active when agg value name is specified as an argument to --values.
add rec to bin func
byte-sequence = add rec to bin func(silk.RWRec, byte-sequence). Names a callback function whose
two arguments are a silk.RWRec object and an aggregate value. The function updates the aggregate value with data from the record and returns a new aggregate value. Both aggregate values are
represented as byte sequences of exactly bin bytes value bytes.
bin bytes value
Contains a positive integer representing the length, in bytes, of the binary aggregate value used by the
various callback functions. Every byte sequence for this field must be exactly this length, and it also
governs the length of the byte sequence specified by initial value.
bin merge func
byte-sequence = bin merge func(byte-sequence, byte-sequence). Names a callback function which
returns the result of merging two binary aggregate values into a new binary aggregate value. This
merge function will often be addition; however, if the aggregate value is a bitmap, the result of merge
function could be the union of the bitmaps. The function should take two byte sequence arguments
and return a byte sequence, where all byte sequences are exactly bin bytes value bytes in length. If
merging the aggregate values is not possible, the function should throw an exception. This function
is used when the data structure used by rwstats or rwuniq runs out memory. When that happens,
the application writes its current state to a temporary file, empties its buffers, and continues reading
records. Once all records have been processed, the application needs to merge the temporary files
to produce the final output. The bin merge func() function is used when merging these binary
aggregate values.
bin to text func
string = bin to text func(byte-sequence). Names a callback function that takes a byte sequence
representing an aggregate value as an argument and returns a printable representation of that aggregate
value. The byte sequence input to bin to text func() will be exactly bin bytes value bytes long. The
length of the text should be no longer than the value specified by column width. If the value
returned from this function is not a string, the returned value is converted to a string by the Python
str() function.
458
December 18, 2014
The SiLK Reference Guide
silkpython(3)
column width value
Contains a positive integer specifying the length of the longest textual field that the
bin to text func() callback function returns. This length will used as the column width when columnar output is requested.
bin compare func
int = bin compare func(byte-sequence, byte-sequence). Names a callback function that is called
with two aggregate values, each represented as a byte sequence of exactly bin bytes value bytes. The
function returns (1) an integer less than 0 if the first argument is less than the second, (2) an integer
greater than 0 if the first is greater than the second, or (3) 0 if the two values are equal. This function
is used by rwstats to sort the bins into top-N order.
description string
Provides a string giving a brief description of the aggregate value, suitable for printing in --help-fields
output. This argument is optional.
initial value
Specifies a byte sequence representing the initial state of the binary aggregate value. This byte sequence
must be of length bin bytes value bytes. If this argument is not specified, the aggregate value is set to
a byte sequence containing bin bytes value null bytes.
initialize func
initialize func(). Names a callback function that is called after the command line arguments have
been processed, and before opening the first file. This function is only called when --values includes
agg value name. The function takes no arguments and its return value is ignored. This argument is
optional.
Byte sequences
The rwgroup, rwsort, rwstats, and rwuniq programs make extensive use of ”byte sequences” (a.k.a.,
”array of bytes”, ”byte strings”, or ”binary values”) in their plug-in functions. The byte sequences are used
in both key fields and aggregate values.
When used as key fields, the values can represent uniqueness or indicate sort order. Two records with the
same byte sequence for a field will be considered identical with respect to that field. When sorting, the byte
sequences are compared in network byte order. That is, the most significant byte is compared first, followed
by the next-most-significant byte, etc. This equates to string comparison starting with the left-hand side of
the string.
When used as an aggregate field, the byte sequences are expected to behave more like numbers, with the
ability to take binary record and add a value to it, or to merge (e.g., add) two byte sequences outside the
context of a SiLK record.
Every byte sequence has an associated length, which is passed into the register field() function in the
bin bytes argument. The length determines how many values the byte sequence can represent. A byte
sequence with a length of 1 can represent up to 256 unique values (from 0 to 255 inclusive). A byte sequence
with a length of 2 can represent up to 65536 unique values (0 to 65535). To generalize, a byte sequence with
a length of n can represent up to 2Л†(8n) unique values (0 to 2Л†(8n)-1).
How byte sequences are represented in Python depends on the version of Python. Python represents a
sequence of characters using either the bytes type (introduced in 2.6) or the unicode type. The bytes type
can encode byte sequences while the unicode type cannot. In Python 2, the str (string) type was an alias
December 18, 2014
459
silkpython(3)
The SiLK Reference Guide
for bytes, so that any Python 2 string is in effect a byte sequence. In Python 3, str is an alias for unicode,
thus Python 3 strings are unicode objects and cannot represent byte sequences.
Python does not make conversions between integers and byte sequences particularly natural. As a result,
here are some pointers on how to do these conversions:
Use the bytes() and ord() methods
If you converting a single integer value that is less than 256, the easiest way to convert it to a byte sequence
is to use the bytes() function; to convert it back, use the ord() function.
seq = bytes([num])
num = ord(seq)
The bytes() function takes a list of integers between 0 and 255 inclusive, and returns a bytes sequence of
the length of that list. To convert a single byte, use a list of a single element. The ord() function takes a
byte sequence of a single byte and returns an integer between 0 and 255.
Note: In versions of Python earlier than 2.6, use the chr() function instead of the bytes() function. It takes
a single number as its argument. chr() will work in Python 2.6 and 2.7 as well, but there are compatibility
problems in Python 3.x.
Use the struct module
When the value you are converting to a byte sequence is 255 or greater, you have to go with another option.
One of the simpler options is to use Python’s built-in struct module. With this module, you can encode a
number or a set of numbers into a byte sequence and convert the result back using a struct.Struct object.
Encoding the numbers to a byte sequence uses the object’s pack() method. To convert that byte sequence
back to the number or set of numbers, use the object’s unpack() method. The length of the resulting byte
sequences can be found in the size attribute of the struct.Struct() object. A formatting string is used to
indicate how the numbers are encoded into binary. For example:
import struct
# Set up the format for two 64-bit numbers
two64 = struct.Struct("!QQ)
# Encode two 64-bit numbers as a byte sequence
seq = two64.pack(num1, num2)
#Unpack a byte sequence back into two 64-bit numbers
(num1, num2) = two64.unpack(seq)
#Length of the encoded byte sequence
bin_bytes = two64.size
In the above, Q represents a single unsigned 64-bit number (an unsigned long long or quad). The ! at the
beginning of the string forces network byte order. (For sort comparison purposes, always pack in network
byte order.)
Here is another example, which encodes a signed 16-bit integer and a floating point number:
import struct
# Set up the format for a 16-bit signed integer and a float
obj = struct.Struct("!hf")
460
December 18, 2014
The SiLK Reference Guide
silkpython(3)
#Encode a 16-bit signed integer and a float as a byte sequence
seq = obj.pack(intval, floatval)
#Unpack a byte sequence back into a 16-bit signed integer and a float
(intval, floatval) = obj.unpack(seq)
#Length of the encoded byte sequence
bin_bytes = obj.size
Note that unpack() returns a sequence. When unpacking a single value, assign the result of unpack to
(variable name,), as shown:
import struct
u32 = struct.Struct("!I")
#Encode an unsigned 32-bit integer as a byte sequence
seq = u32.pack(num1)
#Unpack a byte sequence back into a unsigned 32-bit integer
(num1,) = struct.unpack(seq)
#Length of the encoded byte sequence
bin_bytes = u32.size
The full list of codes can be found in the Python library documentation for the struct module, http:
//docs.python.org/library/struct.html.
Note: Python versions prior to 2.5 do not include support for the struct.Struct object. For older versions
of Python, you have to use struct’s functional interface. For example:
import struct
#Encode a 16-bit signed integer and a float as a byte sequence
seq = struct.pack("!hf", intval, floatval)
#Unpack a byte sequence back into a 16-bit signed integer and a float
(intval, floatval) = struct.unpack("!hf", seq)
#Length of the encoded byte sequence
bin_bytes = struct.calcsize("!hf")
This method works in Python 2.5 and above as well, but is inherently slower, as it requires re-evaluation of
the format string for each packing and unpacking operation. Only use this if there is a need to inter-operate
with older versions of Python.
Use the array module
The Python array module provides another way to create byte sequences. Beware that the array module
does not provide an automatic way to encode the values in network byte order.
OPTIONS
The following options are available when the SiLK Python plug-in is used from rwfilter.
--python-file=FILENAME
Load the Python file FILENAME. The Python code may call register filter() multiple times to define new partitioning functions that takes a silk.RWRec object as an argument. The return value
December 18, 2014
461
silkpython(3)
The SiLK Reference Guide
of the function determines whether the record passes the filter. For backwards compatibility, if register filter() is not called and a function named rwfilter() exists, that function is automatically
registered as the filtering function. Multiple --python-file switches may be used to load multiple
plug-ins.
--python-expr=PYTHON EXPRESSION
Pass the SiLK Flow record if the result of the processing the record with the specified
PYTHON EXPRESSION is true. The expression is evaluated in the following context:
• The record is represented by the variable named rec, which is a silk.RWRec object.
• There is an implicit from silk import * in effect.
The following options are available when the SiLK Python plug-in is used from rwcut, rwgroup, rwsort,
rwstats, or rwuniq:
--python-file=FILENAME
Load the Python file FILENAME. The Python code may call register field() multiple times to define
new fields for use by the application. When used with rwstats or rwuniq, the Python code may call
register field() multiple times to create new aggregate fields. Multiple --python-file switches may
be used to load multiple plug-ins.
EXAMPLES
In the following examples, the dollar sign ($) represents the shell prompt. The text after the dollar sign
represents the command line. Lines have been wrapped for improved readability, and the back slash (\) is
used to indicate a wrapped line.
rwfilter --python-expr
Suppose you want to find traffic destined to a particular host, 10.0.0.23, that is either ICMP or coming from
1434/udp. If you attempt to use:
$ rwfilter --daddr=10.0.0.23 --proto=1,17 --sport=1434
--pass=outfile.rw flowrec.rw
\
the --sport option will not match any of the ICMP traffic, and your result will not contain ICMP records.
To avoid having to use two invocations of rwfilter, you can use the SiLK Python plugin to do the check in
a single pass:
$ rwfilter --daddr=10.0.0.23 --proto=1,17
--python-expr ’rec.protocol==1 or rec.sport==1434’
--pass=outfile.rw flowrec.rw
\
\
Since the Python code is slower than the C code used internally by rwfilter, we want to limit the number
of records processed in Python as much as possible. We use the rwfilter switches to do the address check
and protocol check, and in Python we only need to check whether the record is ICMP or if the source port
is 1434 (if the record is not ICMP we know it is UDP because of the --proto switch).
462
December 18, 2014
The SiLK Reference Guide
silkpython(3)
rwfilter --python-file
To see all records whose protocol is different from the preceding record, use the following Python code. The
code also prints a message to the standard output on completion.
import sys
def filter(rec):
global lastproto
if rec.protocol != lastproto:
lastproto = rec.protocol
return True
return False
def initialize():
global lastproto
lastproto = None
def finalize():
sys.stdout.write("Finished processing records.\n")
register_filter(filter, initialize = initialize, finalize = finalize)
The preceding file, if called lastproto.py, can be used like this:
$ rwfilter --python-file lastproto.py --pass=outfile.rw flowrec.rw
Note: Be careful when using a Python plug-in to write to the standard output, since the Python output
could get intermingled with the output from --pass=stdout and corrupt the SiLK output file. In general,
printing to the standard error is safer.
Command line switch
The following code registers the command line switch count-protocols. This switch is similar to the
standard --protocol switch on rwfilter, in that it passes records whose protocol matches a value specified
in a list. In addition, when rwfilter exits, the plug-in prints a count of the number of records that matched
each specified protocol.
import sys
from silk.plugin import *
pro_count = {}
def proto_count(rec):
global pro_count
if rec.protocol in pro_count.keys():
pro_count[rec.protocol] += 1
return True
return False
December 18, 2014
463
silkpython(3)
The SiLK Reference Guide
def print_counts():
for p,c in pro_count.iteritems():
sys.stderr.write("%3d|%10d|\n" % (p, c))
def parse_protocols(protocols):
global pro_count
for p in protocols.split(","):
pro_count[int(p)] = 0
register_filter(proto_count, finalize = print_counts)
register_switch("count-protocols", handler=parse_protocols,
help="Like --proto, but prints count of flow records")
When this code is saved to the file count-proto.py, it can be used with rwfilter as shown to get a count of
TCP and UDP flow records:
$ rwfilter --start-date=2008/08/08 --type=out
--python-file=count-proto.py --count-proto=6,17
--print-statistics=/dev/null
\
\
rwfilter does not know that the plug-in will be generating output, and rwfilter will complain unless an
output switch is given, such as --pass or --print-statistics. Since our plug-in is printing the data we want,
we send the output to /dev/null.
Create integer key field with simple API
This example creates a field that contains the sum of the source and destination port. While this value may
not be interesting to display in rwcut, it provides a way to sort fields so traffic between two low ports will
usually be sorted before traffic between a low port and a high port.
def port_sum(rec):
return rec.sport + rec.dport
register_int_field("port-sum", port_sum)
If the above code is saved in a file named portsum.py, it can be used to sort traffic prior to printing it
(low-port to low-port will appear first):
$ rwfilter --start-date=2008/08/08 --type=out,outweb
--proto=6,17 --pass=stdout
| rwsort --python-file=portsum.py --fields=port-sum
| rwcut
\
\
\
To see high-port to high-port traffic first, reverse the sort:
$ rwfilter --start-date=2008/08/08 --type=out,outweb
--proto=6,17 --pass=stdout
| rwsort --python-file=portsum.py --fields=port-sum
--reverse
| rwcut
464
\
\
\
\
December 18, 2014
The SiLK Reference Guide
silkpython(3)
Create IP key field with simple API
SiLK stores uni-directional flows. For network conversations that cross the network border, the source and
destination hosts are swapped depending on the direction of the flow. For analysis, you often want to know
the internal and external hosts.
The following Python plug-in file defines two new fields: internal-ip will display the destination IP for an
incoming flow, and the source IP for an outgoing flow, and external-ip field shows the reverse.
import silk
# for convenience, create lists of the types
in_types = [’in’, ’inweb’, ’innull’, ’inicmp’]
out_types = [’out’, ’outweb’, ’outnull’, ’outicmp’]
def internal(rec):
"Returns the IP Address of the internal side of the connection"
if rec.typename in out_types:
return rec.sip
else:
return rec.dip
def external(rec):
"Returns the IP Address of the external side of the connection"
if rec.typename in in_types:
return rec.sip
else:
return rec.dip
register_ip_field("internal-ip", internal)
register_ip_field("external-ip", external)
If the above code is saved in a file named direction.py, it can be used to show the internal and external IP
addresses and flow direction for all traffic on 1434/udp from Aug 8, 2008.
$ rwfilter --start-date=2008/08/08 --type=all
--proto=17 --aport=1434 --pass=stdout
| rwcut --python-file direction.py
--fields internal-ip,external-ip,3-12
\
\
\
Create enumerated key field with simple API
This example expands the previous example. Suppose instead of printing the internal and external IP address,
you wanted to group by the label associated with the internal and external addresses in a prefix map file.
The pmapfilter(3) manual page specifies how to print labels for source and destination IP addresses, but
it does not support internal and external IPs.
Here we take the previous example, add a command line switch to specify the path to a prefix map file, and
have the internal and external functions return the label.
import silk
December 18, 2014
465
silkpython(3)
The SiLK Reference Guide
# for convenience, create lists of the types
in_types = [’in’, ’inweb’, ’innull’, ’inicmp’]
out_types = [’out’, ’outweb’, ’outnull’, ’outicmp’]
# handler for the --int-ext-pmap command line switch
def set_pmap(arg):
global pmap
pmap = silk.PrefixMap(arg)
labels = pmap.values()
width = max(len(x) for x in labels)
register_enum_field("internal-label", internal, width, labels)
register_enum_field("external-label", external, width, labels)
def internal(rec):
"Returns the label for the internal side of the connection"
global pmap
if rec.typename in out_types:
return pmap[rec.sip]
else:
return pmap[rec.dip]
def external(rec):
"Returns the label for the external side of the connection"
global pmap
if rec.typename in in_types:
return pmap[rec.sip]
else:
return pmap[rec.dip]
register_switch("int-ext-pmap", handler=set_pmap,
help="Prefix map file for internal-label, external-label")
Assuming the above is saved in the file int-ext-pmap.py, the following will group the flows by the internal
and external labels contained in the file ip-map.pmap.
$ rwfilter --start-date=2008/08/08 --type=all
--proto=17 --aport=1434 --pass=stdout
| rwuniq --python-file int-ext-pmap.py
--int-ext-pmap ip-map.pmap
--fields internal-label,external-label
\
\
\
\
Create minimum/maximum integer value field with simple API
The following example will create new aggregate fields to print the minimum and maximum byte values:
register_int_min_aggregator("min-bytes", lambda rec: rec.bytes,
(1 << 32) - 1)
register_int_max_aggregator("max-bytes", lambda rec: rec.bytes,
(1 << 32) - 1)
466
December 18, 2014
The SiLK Reference Guide
silkpython(3)
The lambda expression allows one to create an anonymous function. In this code, we need to return the
number of bytes for the given record, and we can easily do that with the anonymous function. Since the
SiLK bytes field is 32 bits, the maximum 32-bit number is passed the registration functions.
Assuming the code is stored in a file bytes.py, it can be used with rwuniq to see the minimum and maximum
byte counts for each source IP address:
$ rwuniq --python-file=bytes.py --fields=sip
--values=records,bytes,min-bytes,max-bytes
\
Create IP key for rwcut with advanced API
This example is similar to the simple IP example above, but it uses the advanced API. It also creates another
field to indicate the direction of the flow, and it does not print the IPs when the traffic does not cross the
border. Note that this code has to determine the column width itself.
import silk, os
# for convenience, create lists of the types
in_types = [’in’, ’inweb’, ’innull’, ’inicmp’]
out_types = [’out’, ’outweb’, ’outnull’, ’outicmp’]
internal_only = [’int2int’]
external_only = [’ext2ext’]
# determine the width of the IP field depending on whether SiLK
# was compiled with IPv6 support, and allow the IP_WIDTH environment
# variable to override that width.
ip_len = 15
if silk.ipv6_enabled():
ip_len = 39
ip_len = int(os.getenv("IP_WIDTH", ip_len))
def cut_internal(rec):
"Returns the IP Address of the internal side of the connection"
if rec.typename in in_types:
return rec.dip
if rec.typename in out_types:
return rec.sip
if rec.typename in internal_only:
return "both"
if rec.typename in external_only:
return "neither"
return "unknown"
def cut_external(rec):
"Returns the IP Address of the external side of the connection"
if rec.typename in in_types:
return rec.sip
if rec.typename in out_types:
return rec.dip
December 18, 2014
467
silkpython(3)
The SiLK Reference Guide
if rec.typename in internal_only:
return "neither"
if rec.typename in external_only:
return "both"
return "unknown"
def internal_external_direction(rec):
"""Generates a string pointing from the sip to the dip, assuming
internal is on the left, and external is on the right."""
if rec.typename in in_types:
return "<---"
if rec.typename in out_types:
return "--->"
if rec.typename in internal_only:
return "-><-"
if rec.typename in external_only:
return "<-->"
return "????"
register_field("internal-ip", column_width = ip_len,
rec_to_text = cut_internal)
register_field("external-ip", column_width = ip_len,
rec_to_text = cut_external)
register_field("int_to_ext", column_width = 4,
rec_to_text = internal_external_direction)
The cut internal() and cut external() functions may return an IPAddr object instead of a string. For
those cases, the Python str() function is invoked automatically to convert the IPAddr to a string.
If the above code is saved in a file named direction.py, it can be used to show the internal and external IP
addresses and flow direction for all traffic on 1434/udp from Aug 8, 2008.
$ rwfilter --start-date=2008/08/08 --type=all
--proto=17 --aport=1434 --pass=stdout
| rwcut --python-file direction.py
--fields internal-ip,int_to_ext,external-ip,3-12
\
\
\
Create integer key field for rwsort with the advanced API
The following example Python plug-in creates one new field, lowest port, for use in rwsort. Using this
field will sort records based on the lesser of the source port or destination port; for example, flows where
either the source or destination port is 22 will occur before flows where either port is 25. This example shows
using the Python struct module with multiple record attributes.
import struct
portpair = struct.Struct("!HH")
def lowest_port(rec):
if rec.sport < rec.dport:
468
December 18, 2014
The SiLK Reference Guide
silkpython(3)
return portpair.pack(rec.sport, rec.dport)
else:
return portpair.pack(rec.dport, rec.sport)
register_field("lowest_port", bin_bytes = portpair.size,
rec_to_bin = lowest_port)
To use this example to sort the records in flowrec.rw, one saves the code to the file sort.py and uses it as
shown:
$ rwsort --python-file=sort.py --fields=lowest_port
flowrec.rw > outfile.rw
\
Create integer key for rwstats and rwuniq with advanced API
The following example defines two key fields for use by rwstats or rwuniq: prefixed-sip and
prefixed-dip. Using these fields, the user can count flow records based on the source and/or destination IPv4 address blocks (CIDR blocks). The default CIDR prefix is 16, but it can be changed by specifying
the --prefix switch that the example creates. This example uses the Python struct module to convert
between the IP address and a binary string.
import os, struct
from silk import *
default_prefix = 16
u32 = struct.Struct("!L")
def set_mask(prefix):
global mask
mask = 0xFFFFFFFF
# the value we are handed is a string
prefix = int(prefix)
if 0 < prefix < 32:
mask = mask ^ (mask >> prefix)
# Convert from an IPv4Addr to a byte sequence
def cidr_to_bin(ip):
if ip.is_ipv6():
raise ValueError, "Does not support IPv6"
return u32.pack(int(ip) & mask)
# Convert from a byte sequence to an IPv4Addr
def cidr_bin_to_text(string):
(num,) = u32.unpack(string)
return IPv4Addr(num)
register_field("prefixed-sip", column_width = 15,
rec_to_bin = lambda rec: cidr_to_bin(rec.sip),
bin_to_text = cidr_bin_to_text,
bin_bytes = u32.size)
December 18, 2014
469
silkpython(3)
The SiLK Reference Guide
register_field("prefixed-dip", column_width = 15,
rec_to_bin = lambda rec: cidr_to_bin(rec.dip),
bin_to_text = cidr_bin_to_text,
bin_bytes = u32.size)
register_switch("prefix", handler=set_mask,
help="Set prefix for prefixed-sip/prefixed-dip fields")
set_mask(default_prefix)
The lambda expression allows one to create an anonymous function. In this code, the lambda function
is used to pass the appropriate IP address into the cidr to bin() function. To write the code without the
lambda would require separate functions for the source and destination IP addresses:
def sip_cidr_to_bin(rec):
return cidr_to_bin(rec.sip)
def dip_cidr_to_bin(rec):
return cidr_to_bin(rec.dip)
The lambda expression helps to simplify the code.
If the code is saved in the file mask.py, it can be used as follows to count the number of flow records seen
in the /8 of each source IP address. The flow records are read from flowrec.rw. The --ipv6-policy=ignore
switch is used to restrict processing to IPv4 addresses.
$ rwuniq --ipv6-policy=ignore --python-file mask.py
--prefix 8 --fields prefixed-sip flowrec.rw
\
Create new average bytes value field for rwstats and rwuniq
The following example creates a new aggregate value that can be used by rwstats and rwuniq. The value
is avg-bytes, a value that calculates the average number of bytes seen across all flows that match the key.
It does this by maintaining running totals of the byte count and number of flows.
import struct
fmt = struct.Struct("QQ")
initial = fmt.pack(0, 0)
textsize = 15
textformat = "%%%d.2f" % textsize
# add byte and flow count from ’rec’ to ’current’
def avg_bytes(rec, current):
(total, count) = fmt.unpack(current)
return fmt.pack(total + rec.bytes, count + 1)
# return printable representation
def avg_to_text(bin):
(total, count) = fmt.unpack(bin)
return textformat % (float(total) / count)
470
December 18, 2014
The SiLK Reference Guide
silkpython(3)
# merge two encoded values.
def avg_merge(rec1, rec2):
(total1, count1) = fmt.unpack(rec1)
(total2, count2) = fmt.unpack(rec2)
return fmt.pack(total1 + total2, count1 + count2)
# compare two encoded values
def avg_compare(rec1, rec2):
(total1, count1) = fmt.unpack(rec1)
(total2, count2) = fmt.unpack(rec2)
return cmp((float(total1) / count1), (float(total2) / count2))
register_field("avg-bytes",
column_width
bin_bytes
add_rec_to_bin
bin_to_text
bin_merge
bin_compare
initial_value
=
=
=
=
=
=
=
textsize,
fmt.size,
avg_bytes,
avg_to_text,
avg_merge,
avg_compare,
initial)
To use this code, save it as avg-bytes.py, specify the name of the Python file in the --python-file switch,
and list the field in the --values switch:
$ rwuniq --python-file=avg-bytes.py --fields=sip
--values=avg-bytes infile.rw
\
This particular example will compute the average number of bytes per flow for each distinct source IP address
in the file infile.rw.
Create integer key field for all tools that use fields
The following example Python plug-in file defines two fields, sport-service and dport-service. These
fields convert the source port and destination port to the name of the ”service” as defined in the file
/etc/services; for example, port 80 is converted to ”http”. This plug-in can be used by any of rwcut,
rwgroup, rwsort, rwstats, or rwuniq.
import os,socket,struct
u16 = struct.Struct("!H")
# utility function to convert number to a service name,
# or to a string if no service is defined
def num_to_service(num):
try:
serv = socket.getservbyport(num)
except socket.error:
serv = "%d" % num
return serv
December 18, 2014
471
silkpython(3)
The SiLK Reference Guide
# convert the encoded port to a service name
def bin_to_service(bin):
(port,) = u16.unpack(bin)
return num_to_service(port)
# width of service columns can be specified with the
# SERVICE_WIDTH environment variable; default is 12
col_width = int(os.getenv("SERVICE_WIDTH", 12))
register_field("sport-service", bin_bytes = u16.size,
column_width = col_width,
rec_to_text = lambda rec: num_to_service(rec.sport),
rec_to_bin = lambda rec: u16.pack(rec.sport),
bin_to_text = bin_to_service)
register_field("dport-service", bin_bytes = u16.size,
column_width = col_width,
rec_to_text = lambda rec: num_to_service(rec.dport),
rec_to_bin = lambda rec: u16.pack(rec.dport),
bin_to_text = bin_to_service)
If this file is named service.py, it can be used by rwcut to print the source port and its service:
$ rwcut --python-file service.py
--fields sport,sport-service flowrec.rw
\
Although the plug-in can be used with rwsort, the records will be sorted in the same order as the numerical
source port or destination port.
$ rwsort --python-file service.py
--fields sport-service flowrec.rw > outfile.rw
\
When used with rwuniq, it can count flows, bytes, and packets indexed by the service of the destination
port:
$ rwuniq --python-file service.py --fields dport-service
--values=flows,bytes,packets flowrec.rw
\
Create human-readable fields for all tools that use fields
The following example adds two fields, hu-bytes and hu-packets, which can be used as either key fields
or aggregate value fields. The example uses the formatting capabilities of netsa-python (http://tools.netsa.
cert.org/netsa-python/index.html) to present the bytes and packets fields in a more human-friendly manner.
When used as a key, the hu-bytes field presents the value 1234567 as 1205.6Ki or as 1234.6k when the
HUMAN USE BINARY environment variable is set to False.
When used as a key, the hu-packets field adds a comma (or the character specified by the HUMAN THOUSANDS SEP environment variable) to the display of the packets field. The value 1234567
becomes 1,234,567.
472
December 18, 2014
The SiLK Reference Guide
silkpython(3)
The hu-bytes and hu-packets fields can also be used as aggregate value fields, in which case they compute
the sum of the bytes and packets, respectively, and display it as for the key field.
The code for the plug-in is shown here, and an example of using the plug-in follows the code.
import silk, silk.plugin
import os, struct
from netsa.data.format import num_prefix, num_fixed
# Whether the use Base-2 (True) or Base-10 (False) values for
# Kibi/Mebi/Gibi/Tebi/... vs Kilo/Mega/Giga/Tera/...
use_binary = True
if (os.getenv("HUMAN_USE_BINARY")):
if (os.getenv("HUMAN_USE_BINARY").lower() == "false"
or os.getenv("HUMAN_USE_BINARY") == "0"):
use_binary = False
else:
use_binary = True
# Character to use for Thousands separator
thousands_sep = ’,’
if (os.getenv("HUMAN_THOUSANDS_SEP")):
thousands_sep = os.getenv("HUMAN_THOUSANDS_SEP")
# Number of significant digits
sig_fig=5
# Use a 64-bit number for packing the bytes or packets data
fmt = struct.Struct("Q")
initial = fmt.pack(0)
### Bytes functions
# add_rec_to_bin
def hu_ar2b_bytes(rec, current):
global fmt
(cur,) = fmt.unpack(current)
return fmt.pack(cur + rec.bytes)
# rec_to_binary
def hu_r2b_bytes(rec):
global fmt
return fmt.pack(rec.bytes)
# bin_to_text
def hu_b2t_bytes(current):
global use_binary, sig_fig, fmt
(cur,) = fmt.unpack(current)
return num_prefix(cur, use_binary=use_binary, sig_fig=sig_fig)
December 18, 2014
473
silkpython(3)
The SiLK Reference Guide
# rec_to_text
def hu_r2t_bytes(rec):
global use_binary, sig_fig
return num_prefix(rec.bytes, use_binary=use_binary, sig_fig=sig_fig)
### Packets functions
# add_rec_to_bin
def hu_ar2b_packets(rec, current):
global fmt
(cur,) = fmt.unpack(current)
return fmt.pack(cur + rec.packets)
# rec_to_binary
def hu_r2b_packets(rec):
global fmt
return fmt.pack(rec.packets)
# bin_to_text
def hu_b2t_packets(current):
global thousands_sep, fmt
(cur,) = fmt.unpack(current)
return num_fixed(cur, dec_fig=0, thousands_sep=thousands_sep)
# rec_to_text
def hu_r2t_packets(rec):
global thousands_sep
return num_fixed(rec.packets, dec_fig=0, thousands_sep=thousands_sep)
### Non-specific functions
# bin_compare
def hu_bin_compare(cur1, cur2):
if (cur1 < cur2):
return -1
return (cur1 > cur2)
# bin_merge
def hu_bin_merge(current1, current2):
global fmt
(cur1,) = fmt.unpack(current1)
(cur2,) = fmt.unpack(current2)
return fmt.pack(cur1 + cur2)
### Register the fields
register_field("hu-bytes", column_width=10, bin_bytes=fmt.size,
rec_to_text=hu_r2t_bytes, rec_to_bin=hu_r2b_bytes,
bin_to_text=hu_b2t_bytes, add_rec_to_bin=hu_ar2b_bytes,
bin_merge=hu_bin_merge, bin_compare=hu_bin_compare,
initial_value=initial)
474
December 18, 2014
The SiLK Reference Guide
silkpython(3)
register_field("hu-packets", column_width=10, bin_bytes=fmt.size,
rec_to_text=hu_r2t_packets, rec_to_bin=hu_r2b_packets,
bin_to_text=hu_b2t_packets, add_rec_to_bin=hu_ar2b_packets,
bin_merge=hu_bin_merge, bin_compare=hu_bin_compare,
initial_value=initial)
This shows an example of the plug-in’s invocation and output when the code below is stored in the file
human.py.
$ rwstats --count=5 --no-percent --python-file=human.py
\
--fields=proto,hu-bytes,hu-packets
\
--values=records,hu-bytes,hu-packets data.rw
INPUT: 501876 Records for 305417 Bins and 501876 Total Records
OUTPUT: Top 5 Bins by Records
pro| hu-bytes|hu-packets|
Records| hu-bytes|hu-packets|
17|
328|
1|
15922|
4.98Mi|
15,922|
17|
76.0|
1|
15482|
1.12Mi|
15,482|
1|
840|
10|
5895|
4.72Mi|
58,950|
17|
68.0|
1|
4249|
282Ki|
4,249|
17|
67.0|
1|
4203|
275Ki|
4,203|
UPGRADING LEGACY PLUGINS
Some functions were marked as deprecated in SiLK 2.0, and have been removed in SiLK 3.0.
Prior to SiLK 2.0, the register field() function was called register plugin field(), and it had the following
signature:
register plugin field(field name,
[bin len=bin bytes value,]
[bin to text=bin to text func,]
[text len=column width value,] [rec to bin=rec to bin func,] [rec to text=rec to text func])
To convert from register plugin field to register field, change text len to column width, and change
bin len to bin bytes. (Even older code may use field len; this should be changed to column width as
well.)
The register filter() function was introduced in SiLK 2.0. In versions of SiLK prior to SiLK 3.0, when
rwfilter was invoked with --python-file and the named Python file did not call register filter(), rwfilter
would search the Python input for functions named rwfilter() and finalize(). If it found the rwfilter()
function, rwfilter would act as if the file contained:
register_filter(rwfilter, finalize=finalize)
To update your pre-SiLK 2.0 rwfilter plug-ins, simply add the above line to your Python file.
ENVIRONMENT
PYTHONPATH
The Python module that the SiLK Python plug-in uses (pysilk nl.so) is installed under SiLK’s installation tree. It may be necessary to set or modify the PYTHONPATH environment variable so Python
can find this module. For information on using SiLK from Python, see pysilk(3).
December 18, 2014
475
silkpython(3)
The SiLK Reference Guide
PYTHONVERBOSE
If the SiLK Python extension or plug-in fails to load, setting this environment variable to a non-empty
string may help you debug the issue.
SILK PYTHON TRACEBACK
When set, Python plug-ins will output trace back information regarding Python errors to the standard
error.
SILK PATH
This environment variable gives the root of the install tree. When searching for silkpython.so, a SiLK
application may use this environment variable. See the FILES section for details.
SILK PLUGIN DEBUG
When set to 1, the SiLK applications print status messages to the standard error to assist you in finding
problems loading plug-in files and registering fields. The application prints messages as it attempts
to find and open the silkpython.so plug-in. In addition, when an attempt to register a field fails, the
application prints a message specifying the additional function(s) that must be defined to register the
field in the application. Be aware that the output can be rather verbose. A typical invocation using
this variable is
env SILK_PLUGIN_DEBUG=1 rwcut --python-file=fields.py --version
FILES
${SILK PATH}/lib64/silk/silkpython.so
${SILK PATH}/lib64/silkpython.so
${SILK PATH}/lib/silk/silkpython.so
${SILK PATH}/lib/silkpython.so
/usr/local/lib64/silk/silkpython.so
/usr/local/lib64/silkpython.so
/usr/local/lib/silk/silkpython.so
/usr/local/lib/silkpython.so
Possible locations for the plug-in.
SEE ALSO
pysilk(3), rwfilter(1), rwcut(1), rwgroup(1), rwsort(1), rwstats(1), rwuniq(1), pmapfilter(3),
silk(7), python(1), http://docs.python.org/
476
December 18, 2014
5
SiLK File Formats
The formats of some SiLK files are described in this section.
477
sensor.conf (5)
The SiLK Reference Guide
sensor.conf
Sensor Configuration file for rwflowpack and flowcap
DESCRIPTION
As part of collecting flow data, the rwflowpack(8) and flowcap(8) daemons need to know what type of
data they are collecting and how to collect it (e.g., listen on 10000/udp for NetFlow v5; listen on 4740/tcp for
IPFIX). In addition, the rwflowpack daemon needs information on how to categorize the flow: for example,
to label the flows collected at a border router as incoming or outgoing. The Sensor Configuration file,
sensor.conf, contains this information, and this manual page describes the syntax of the file (see SYNTAX
below) and provides some example configurations (see EXAMPLES).
The sensor.conf file may have any name, and it may reside in any location. The name and location of the
file is specified by the --sensor-configuration switch to rwflowpack and flowcap.
The Sensor Configuration File defines the following concepts:
probe
A probe specifies a source for flow data. The source could be a port on which flowcap or rwflowpack
collects NetFlow or IPFIX data from a flow generator such as a router or the yaf software (http:
//tools.netsa.cert.org/yaf/). In rwflowpack, the source can be a directory to periodically poll for files
containing NetFlow v5 PDUs, IPFIX records, or SiLK Flow records. When defining a probe, you must
specify a unique name for the probe and the probe’s type.
group
A group is a named list that contains one of the following: CIDR blocks, the names of IPset files, or
integers representing SNMP interfaces or VLAN identifiers. The use of groups is optional; the primary
purpose of a group is to allow the administrator to specify a commonly used list (such as the IP space
of the network being monitored) in a single location.
sensor
A sensor represents a logical collection point for the purposes of analysis. The sensor contains configuration values that rwflowpack uses to categorize each flow record depending on how the record moves
between networks at the collection point. Since the sensors and the categories (known as flowtypes or as
class/type pairs) are also used for analysis, they are defined in the Site Configuration file, described in
silk.conf(5). The Sensor Configuration file maps sensors to probes and specifies the rules required to
categorize the data. Usually one sensor corresponds to one probe; however, a sensor may be comprised
of multiple probes, or the flow data collected at a single probe may be handled by multiple sensors.
The next section of this manual page describes the syntax of the sensor.conf file.
Using the syntax to configure a sensor requires knowledge of the packing logic that rwflowpack is using.
The packing logic is the set of rules that rwflowpack uses to assign a flowtype to each record it processes.
The default packing logic is for the twoway site, which is described in the packlogic-twoway(3) manual
page. Additional packing logic rules are available (e.g., packlogic-generic(3)).
The last major section of this document is EXAMPLES where several common configurations are shown.
These examples assume rwflowpack is using the packing logic from the twoway site.
478
December 18, 2014
The SiLK Reference Guide
sensor.conf (5)
SYNTAX
When parsing the Sensor Configuration file, blank lines are ignored. At any location in a line, the character
# indicates the beginning of a comment, which continues to the end of the line. These comments are ignored.
All other lines begin with optional leading whitespace, a command name, and one or more arguments to
the command. Command names are a sequence of non-whitespace characters, not including the character
#. Arguments are textual atoms: any sequence of non-whitespace, non-# characters, including numerals and
punctuation.
There are four contexts for commands: top-level, probe block, group block, and sensor block. The probe
block, group block, and sensor block contexts are used to describe individual features of probes, groups, and
sensors, respectively.
The valid commands for each context are described below.
Top-Level Commands
In addition to the commands to begin a probe, group, or sensor block, the top-level context supports the
following command:
include ”path”
The include command is used to include the contents of another file whose location is path. This may
be used to separate large configurations into logical units.
Probe Block
With the exception of the probe command, the commands listed below are accepted within the probe
context. Note that one and only one of listen-on-port, listen-on-unix-socket, read-from-file, or polldirectory must be specified.
probe probe-name probe-type
The probe command is used in the top-level context to begin a new probe block which continues to
the end probe command. The arguments to the probe command are the name of the probe being
defined and the probe type. The probe-name must be unique among all probes. It must begin with
a letter, and it may not contain whitespace characters or the slash character (/). When a probe is
associated with a single sensor, it is good practice to give the probe the same name as the sensor. The
probe-type must be one of the following:
netflow-v5
This probe processes NetFlow v5 protocol data units (PDU) that the daemon reads from a UDP
port or from a file. NetFlow may be generated by a router or by software that reads packet
capture (pcap(3)) data and generates NetFlow v5 records.
netflow
This is an alias for netflow-v5 for backwards compatibility. This alias is deprecated, and it may
be removed in a future release.
ipfix
An IPFIX probe processes Internal Protocol Flow Information eXchange records that the daemon
reads over the network from an IPFIX source such as yaf(1). An IPFIX probe can also poll a
directory for files generated by the yaf program. To support IPFIX probes, SiLK must be built
December 18, 2014
479
sensor.conf (5)
The SiLK Reference Guide
with support for the external library libfixbuf, version 1.3.0 or later. Both yaf and libfixbuf are
available from http://tools.netsa.cert.org/.
netflow-v9
This probe processes NetFlow v9 protocol data units (PDU) that the daemon reads from a UDP
port from a router. To support NetFlow v9 probes, SiLK must be built with support for the
external library libfixbuf, version 1.3.0 or later.
sflow
This probe processes sFlow v5 records that the daemon reads from a UDP port. To support sFlow
probes, SiLK must be built with support for the external library libfixbuf, version 1.6.0 or later.
Since SiLK 3.9.0.
silk
A SiLK probe processes the records contained in SiLK Flow files created by previous invocations
of rwflowpack. The flows will be completely re-packed, as if they were just received over the
network. The sensor and flowtype values in each flow will be ignored. Note that SiLK usually
removes the SNMP interfaces from its flow records, and it is likely that you will be unable to use
the SNMP interfaces to pack the flows.
end probe
The end probe command ends the definition of a probe. Following an end probe command, top-level
commands are again accepted.
listen-on-port port
This command specifies the network port where the probe should collect flow data. When listening to
NetFlow from a Cisco router, this is the port that was specified to the Cisco ISO command
ip flow-export [ip-address] [port]
When listening to IPFIX data from yaf, this is the value specified to yaf ’s --ipfix-port switch.
protocol { tcp | udp }
This command, required when listen-on-port is given, specifies whether the port is a tcp or udp port.
IPFIX probes support both types; the only permitted value for all other probe types is udp. When
listening to IPFIX data from yaf, this is the value specified to yaf ’s --ipfix switch.
accept-from-host host-name
This optional command specifies the IP or name of the host that is allowed to connect to the port
where the probe is listening. When this command is not present, any host may connect. The command
may only be specified when the listen-on-port command is also present. When listening for NetFlow,
this attribute would be the IP address of the router as seen from the machine running rwflowpack or
flowcap.
This paragraph applies only when using versions of SiLK and libfixbuf prior to 3.4.0 and 1.2.0, respectively: The accept-from-host command cannot be used with UDP-based IPFIX probes or NetFlow
v9 probes. If multiple probes of these types are needed, they should have different listen-on-port
values.
listen-as-host host-name
This optional command is used on a multi-homed machine to specify the address the probe should
listen on (bind(2) to). Its value is the name of the host or its IP address. If not present, the program
will listen on all the machine’s addresses. The command may only be specified when the listen-onport command is also present. For listening to NetFlow, the value would be the ip-address that was
specified to the Cisco ISO command
480
December 18, 2014
The SiLK Reference Guide
sensor.conf (5)
ip flow-export [ip-address] [port]
listen-on-unix-socket path-to-unix-socket
The value contains the path name to a UNIX domain socket where the flow generator writes its data.
The parent directory of path-to-unix-socket must exist.
poll-directory directory-path
When this command is given, rwflowpack will periodically poll the directory-path to look for files to
process. flowcap will exit with an error if you attempt to use probes that contain this command since
flowcap does not support reading data from files. When polling the directory, zero length files and
files whose name begin with a dot (.) are ignored. This command may used be with the following
probe types:
• For SiLK probes, each file must be a valid SiLK Flow file.
• IPFIX probes can process files created by the yaf program.
• A NetFlow v5 probe will process files containing NetFlow v5 PDUs. The format of these files is
specified in the description of the read-from-file command.
read-from-file dummy-value
When this command is given, rwflowpack will read NetFlow v5 records from the file specified by
the --netflow-file command line switch. The value to the read-from-file command is completely
ignored, and we recommend you use /dev/null as the value. flowcap will exit with an error if you
attempt to use probes that contain this command since flowcap does not support reading data from
files. The format of a NetFlow v5 file is that the file’s length should be an integer multiple of 1464
bytes, where 1464 is the maximum length of the NetFlow v5 PDU. Each 1464 block should contain the
24-byte NetFlow v5 header and space for thirty 48-byte flow records, even if fewer NetFlow records are
valid. rwflowpack will accept NetFlow v5 files that have been compressed with the gzip(1) program.
log-flags { none | { all | bad | default | firewall-event | missing | record-timestamps | sampling | ... } }
This optional command gives specific logging instructions for the probe. If you wish to reduce the
verbosity of the log, you may use the log-flag command to adjust the information logged. The
possible values are:
all
Log everything.
bad
Write messages about an individual NetFlow v5 record where the packet or octet count is zero,
the packet count is larger than the octet count, or the duration of the flow is larger than 45 days.
default
Enable the following values: bad, libfixbuf, missing, sampling. This is the default value. Since
SiLK 3.10.0. (Prior to SiLK 3.10.0, all was the default.)
firewall-event
When the firewall-event quirks flag is set and the probe is processing NetFlow v9 or IPFIX,
write messages about records that are ignored because the firewall event information element on
the record is something other than flow deleted or flow denied. Since SiLK 3.8.1.
missing
Examine the sequence numbers of NetFlow v5 packets and write messages about missing and outof-sequence packets. (Currently it is not possible to suppress messages regarding out-of-sequence
NetFlow v9 or IPFIX packets.)
December 18, 2014
481
sensor.conf (5)
The SiLK Reference Guide
none
Log nothing. It is an error to combine this value with any other.
record-timestamps
Log the timestamps that appear on each record. This produces a lot of output, and it is primarily
used for debugging. Since SiLK 3.10.0.
sampling
Write messages constructed by parsing the NetFlow v9 Options Templates that specify the sampling algorithm (when samplingAlgorithm and samplingInterval IEs are present) or flow sampler mode (when flowSamplerMode and flowSamplerRandomInterval IEs are present). Requires
libfixbuf-1.4.0 or later. Since SiLK 3.8.0.
interface-values { snmp | vlan }
This optional command specifies the values that should be stored in the input and output fields of the
SiLK Flow records that are read from the probe. If this command is not given, the default is snmp.
Note that NetFlow v5 probes only support snmp.
snmp
Store the index of the network interface card (ifIndex ) where the flows entered and left the router,
respectively.
vlan
Store the VLAN identifier for the source and destination networks, respectively. If only one VLAN
ID is available, input is set to that value and output is set to 0.
This setting does not affect whether rwflowpack(8) stores the input and output fields to its output
files. Storage of those fields is controlled by rwflowpack’s --pack-interfaces switch.
quirks { none | { firewall-event | zero-packets | missing-ips ... } }
This optional command is used to indicate that special (or quirky) handling of the incoming data is
desired. The value none disables all quirks, and that is the default setting. If the value is not none,
it may be a list of one or more of the values specified below. Since SiLK 3.8.0.
firewall-event
Enable checking for firewall event information elements (IEs) when processing IPFIX or NetFlow
v9 flow records. The IPFIX firewallEvent IE is 233. The Cisco elements are NF F FW EVENT
(IE 40005) and NF F FW EXT EVENT (IE 33002). When this quirk is active, firewall events
that match the value 2 (flow deleted) are categorized as normal flows, firewall events that match
the value 3 (flow denied) are usually put into one of non-routed types (e.g., innull, outnull,
see packlogic-twoway(3) and packlogic-generic(3) for details), and all other firewall events
values are dropped. (Note that a log message is generated for these dropped records; to suppress
these messages, use the log-flags command.) When this quirk is not provided, SiLK handles
these records normally, which may result in duplicate flow records. (Prior to SiLK 3.8, SiLK
dropped all flow records that contained a firewall event IE.)
zero-packets
Enable support for flow records either that do not contain a valid packets field, such as those from
the Cisco ASA series of routers, or that have an unusually large bytes-per-packet ratio. When
this quirk is active, SiLK sets the packet count to 1 when the incoming IPFIX or NetFlow v9 flow
record has a the packet count if 0. This quirk may modify the file format used by rwflowpack
for IPv4 records in order to support large byte-per-packet ratios.
482
December 18, 2014
The SiLK Reference Guide
sensor.conf (5)
missing-ips
Store a flow record even when the record’s NetFlow v9/IPFIX template does not contain IP
addresses. One change in SiLK 3.8.0 was to ignore flow records that do not have a source and/or
destination IP address; this quirk allows one to undo the effect of that change. Since SiLK 3.8.1.
priority value
This optional command is deprecated. It exists for backwards compatibility and will be removed in
the next major release.
To summarize the probe types and the input they can accept:
Probe Type
==========
ipfix
netflow-v5
netflow-v9
sflow
silk
Berkeley
Socket
==========
tcp/udp
udp
udp
udp
Directory
Polling
==========
yes
yes
UnixDomain
Socket
==========
Single
File
==========
yes
yes
Group Block
The use of group blocks is optional. They are a convenience to define a list of commonly used CIDR blocks,
IPset files, or integer values that are treated as SNMP interfaces or VLAN identifiers. Groups are used
in sensor blocks as described in the descriptions for the discard-when, discard-unless, network-nameipblocks, network-name-ipsets and network-name-interfaces commands, below.
The commands in a group definition must all be of the same type. For example, you cannot mix ipblocks
and ipsets commands in a single group definition, even though both contain IP addresses.
The contents of an existing group may be added to the current group block by using a group reference after
the appropriate keyword. A group reference is the name of the group prefixed by the at (@) character.
The group command is used at top-level to begin a group definition block, and the remaining commands
are accepted within the group block.
group group-name
The group command begins a new group definition block which continues to the end group command.
The argument to the group command is the name of the group being defined. The group-name must
be unique among all groups. It must begin with a letter, and it may not contain whitespace characters
or the slash character (/).
end group
The end group command ends the definition of a group. Following an end group command, top-level
commands are again accepted.
interfaces {integer | group-ref } [integer group-ref...]
The interfaces command adds integer values to a group, where each integer is treated as an SNMP
interface number or VLAN identifier. The interfaces command may appear multiple times in a group
block. Each integer value may be between 0 and 65535 inclusive.
December 18, 2014
483
sensor.conf (5)
The SiLK Reference Guide
ipblocks {cidr-block | group-ref } [cidr-block group-ref...]
The ipblocks command adds CIDR block values to a group. The ipblocks command may appear
multiple times in a group block. For groups containing more than a couple of CIDR blocks, consider
using an IPset instead.
ipsets {filename | group-ref } [filename group-ref...]
The ipset command adds the IP addresses specified in an IPset file (such as that created by rwsetbuild(1)) a group. The ipsets command may appear multiple times in a group block. When multiple
IPset files are specified, the group maintains a single IPset that is the union of files. rwflowpack exits
with an error if the IPset file does not exist or does not contain any IP addresses. Since SiLK 3.10.0.
Specifying the at (@) character and the name of an existing group within an interfaces, ipblocks, or ipsets
command causes the contents of the existing group to be added to the current group as long as the existing
group contains interfaces, ipblocks, or ipsets, respectively. A group does not reference other groups; the
contents of the existing groups are copied into the current group.
For example group blocks, see Group definitions below.
Sensor Block
The information from the sensor block is used by rwflowpack to determine how to categorize a flow; that is,
in which file the flow record is stored. The packlogic-twoway(3) manual page describes how rwflowpack
may use the sensor blocks to determine a record’s category.
When the Sensor Configuration file is used with flowcap, no sensors need to be defined. In fact, flowcap
completely ignores all text inside each sensor block.
The sensor block works with the packing logic to determine where rwflowpack stores flow records. The
packing logic plug-in specifies a list of network names, and you will refer to these networks when you configure
the sensor block. Most plug-ins provide the external, internal, and null names, where internal refers to
network being monitored, null are flows that were blocked by the router’s access control list, and external is
everything else.
Several of the commands in the sensor block require as an argument a list of CIDR blocks, a list of IPset files,
or a list of integers. Instead of specifying a list of values, you may specify a group reference to a group (see
Group Block) containing ipblocks, ipsets, or interfaces, respectively. (A group reference the at (@) character
followed by group’s name.) These lists are defined as follows:
cidr-block-list
A cidr-block-list contains one or more CIDR blocks or group references that represent an address space.
As part of determining how to process a flow record, rwflowpack may check whether the record’s
source or destination IP address is in the list. When comparing an IP address to a cidr-block-list, note
the following:
• the IP address is compared to each element in the cidr-block-list, stopping once a match is made
• when comparing an IPv4 address to a cidr-block-list element that is IPv6, the IPv4 address is
converted to IPv6 by mapping it into the ::ffff:0:0/96 prefix for purposes of the comparison
• when comparing an IPv6 address to a cidr-block-list element that is IPv4, an IPv6 address in the
::ffff:0:0/96 prefix will be converted to IPv4 for purposes of the comparison and any other IPv6
address will fail the comparison
484
December 18, 2014
The SiLK Reference Guide
sensor.conf (5)
ipset-list
An ipset-list contains the names of one or more IPset files or group references. These files represent an
address space, and rwflowpack may check whether a flow record’s source or destination IP address
is in this address space. When multiple IPset files are specified, the contents of the files are merged
into a single IPset. rwflowpack exits with an error if the IPset file does not exist or does not contain
any IP addresses. The rules for comparing IPv4 and IPv6 addresses are the same as those for the
cidr-block-list. Since SiLK 3.10.0.
interface-list
An interface-list contains a one or more group references or integers (ranging from 0 to 65535) that
represent SNMP interface index(es) or VLAN identifiers. As part of determining how to process
a flow record, rwflowpack may check whether the record’s input or output fields are in the list.
Whether the input and output fields contain SNMP interfaces or VLAN identifiers is determined by
the interface-values command in the probe block (c.f. Probe Block).
The sensor command is used in the top-level context to begin a sensor configuration block, and the remaining
commands are accepted within the sensor block.
sensor sensor-name
The sensor command begins a new sensor configuration block. It takes as an argument the name
of the sensor being configured, and that sensor must be defined in the Site Configuration file (see
silk.conf(5)). A sensor block is closed with the end sensor command. You may have multiple sensor
blocks that have the same sensor-name.
end sensor
The end sensor command ends the configuration of a sensor. Following an end sensor command,
top-level commands are again accepted.
probe-type-probes probe-name [probe-name ...]
This command associates the listed probe names of the given probe type with the sensor. The probes
do not have to be defined before they are used. (Note this also means that a mistyped probe-name
will not be detected.) For example, netflow-v5-probes S1 says that S1 is a netflow-v5 probe; whenever
flow data arrives on the S1 probe, the sensor associated with the probe notices that data is available
and processes it.
source-network network-name
This command causes the sensor to assume that all flows originated from the network named networkname. For example, if a sensor is associated a probe that only monitors incoming traffic, you could
use source-network external to specify that all traffic originated from the external network.
destination-network network-name
This command causes the sensor to assume that all flows were sent to the network named networkname.
network-name-ipblocks {cidr-block-list | remainder}
This command specifies the IP-space that is assigned to the network named network-name. The
value of the command can be the keyword remainder or a list containing CIDR blocks and/or group
references to groups containing CIDR blocks. When the value is the keyword remainder, the IPspace for network-name is conceptually all IPs not assigned to other networks on this sensor. The
remainder keyword may only appear one time within a sensor block.
December 18, 2014
485
sensor.conf (5)
The SiLK Reference Guide
network-name-ipsets {ipset-list | remainder}
This command specifies the IP-space that is assigned to the network named network-name. The value
of the command can be the keyword remainder or a list containing the names of IPset files and/or
group references to groups containing IPset files. When the value is the keyword remainder, the
IP-space for network-name is conceptually all IPs not assigned to other networks on this sensor. The
remainder keyword may only appear one time within a sensor block.
network-name-interfaces {interface-list | remainder}
This command specifies the SNMP interface index(es) or VLAN identifiers that are assigned to the
network named network-name. The value of the command can be the keyword remainder or a list
containing interface numbers and/or group references to groups containing interfaces. When the value
is the keyword remainder, the interface list is computed by finding all interface values not assigned
to other networks on this sensor. The remainder keyword may only appear one time within a sensor
block.
isp-ip ip-address [ip-address ...]
This optional command may be used for a sensor that processes NetFlow data. The value to the
command is a list of IP addresses in dotted-decimal notation, where the IPs are the addresses of the
NICs on the router. For traffic that doesn’t leave the router (and thus was sent to the router’s nullinterface), some packing-logic plug-ins use these IPs to distinguish legitimate traffic for the router (e.g.,
routing protocol traffic, whose destination address would be in this list) from traffic that violated the
router’s access control list (ACL).
The following optional sensor block commands provide a way to filter the flow records that rwflowpack
packs for a sensor. Each filter begins with either discard-when or discard-unless, mentions a flow record
field, and ends with a list of values.
The discard-when command causes the sensor to ignore the flow record if the property matches any of the
elements in the list. When a match is found, rwflowpack immediately stops processing the record for the
current sensor and the flow is not packed for this sensor.
The discard-unless command causes the sensor to ignore the flow record unless the property matches one
of the elements in the list. That is, the flow record is packed only if its property matches one of the values
specified in the list, and, when multiple discard-unless commands are present, if the flow record matches
the values specified in each.
For each individual property, only one of discard-when or discard-unless may be specified.
discard-when source-interfaces interface-list
Instructs rwflowpack to discard a flow record for this sensor if the value in the flow’s input field
is listed in interfac