Using System Health toIsolate Problems

Using System Health to Isolate Problems
Using the System Health to Isolate Problems
Modus has a System Health panel that provides at-a-glance information about the state of Modus
your server. Historical information can be accessed by clicking on the individual status indicators.
This document provides an overview about the meaning of the indicators, the symptoms you may
encounter and how to resolve any problems.
We recommend that you read Troubleshooting Mail Delivery Problems as it explains how the
spools work and the services responsible for the spool subfolders.
If you have any questions, please contact support@vircom.com.
2
Using the System Health to Isolate Problems
THE SYSTEM HEALTH PANEL
The panel is divided into 4 Sections: System Status, Performance Profile, System Activity and
System Info.
1) SYSTEM STATUS
The System Status provides general system statistics. For specifics, please refer to the Web
Component Appendix in the Modus Administration guides. The following provides information
about abnormal readings and what to do to resolve problems.
3
Using the System Health to Isolate Problems
1.1 CPU
If the CPU usage is high, look at the historical graph associated to it to check if this is recent or if it
has been this way for an extending period. Go to Task Manager to see which Modus services are
causing the high CPU usage.
1.1.1 MODUSCAN – Modus Scanning Service
If the MODUSCAN service is always at 100%, check the System Activity > Processing Queue to
see if there is a problem here, too. A continual backlog of messages indicates that MODUSCAN is
not handling mail traffic properly.
Problems & Solutions:
a) An extremely large message could cause the queue to buildup:
•
In the Console, go to System – Properties – Settings and click on Advanced
•
Check Max. Server Message Size for the maximum message size
•
In Windows Explorer, go to the …\Vircom\Modus<Mail or Gate>\spool\invirus\b00
hierarchy to locate a message (MSG file) that could be causing the backlog
o
Stop the MODUSCAN & MODUSADM services
o
Move the BXXXXXXXXXX.MSG and BXXXXXXXXXX.RCP file pair to another folder
o
Start the MODUSCAN & MODUSADM services
o
Message should start to flow and the Processing Queue will show this
b) Sieve Database:
Because the MODUSCAN service checks the Sieve database for custom scripts, trusted and blocked
senders, it is possible that a corrupted database is causing the errors which, in turn, are causing
MODUSCAN to spike.
•
In Windows Explorer, go to the ...\Vircom\Modus<Mail or Gate>\Log folder
•
Locate the latest ERR*.LOG and open it with Notepad
•
o
Look for MODUSCAN errors which indicate that there are SieveStore errors
o
If there are SieveStore errors and you are using an SQL-based sieve database,
revert to the local sieve database
o
Stop and start the MODUSCAN and MODUSADM services
o
It may take some time for the backlog to diminish
Monitor the Processing Queue; the message count should also diminish
c) Failing or slow user authentication:
User authentication could cause a slowdown. If you are using database mailboxes, it is possible
that a bottleneck is causing authentication problems. As a result, this causes mail processing by
MODUSCAN to slow down (since it is checking the user database to see if the users have the scan
flag turned on for spam, viruses and attachments).
•
Check the connectivity to your user database – look at the DSN associated to it
4
Using the System Health to Isolate Problems
•
•
On the Modus server, use a query tool such as QTODBC (download it at
http://gpoulose.home.att.net/) or, on the SQL Server, use Query Analyzer to open the user
database
o
Use Select * from VOPMAIL where username = ‘something’ and domain =
‘somedomain’
o
If the query takes too long time to respond (>1 sec), there is a performance issue
with SQL
Temporarily disable Quotas
o
The quota function attempts multiple writes to the user database, when a message
comes in for a user
o
Writing to the database is more intensive than reading from it and can impose a
load on the server so disabling quotas resolve the problem on a system with
problematic performance
•
Consult the KB article How-to: Prevent Quarantine-Related Problems as it discusses
issues that are applicable to database related problems
•
In the Console, go to Spam – Preferences – Options and enable Force scanning for all
Domains and all Users to bypass the check and reduce the number of queries performed to
the authentication database
o
Repeat this for Virus and Forbidden Attachments
•
Stop & start the MODUSCAN & MODUSADM services
•
Monitor the Processing Queue; the message count should diminish
d) A spam attack is in progress:
Your Modus server could be flooded beyond the usual traffic load and is over capacity. In this case,
you will likely experience an increase in the Processing Queue and the Message Delivery Queue. A
spammer may be on your network, using your mail server.
•
On the Modus server,
Gate>\spool\domains
•
If there are hundreds of sub-folders, a spammer is using your network
•
To pinpoint the IP address that the spammer is using, go to one of the domain sub-folders
and open a .RCP or .LCK file with Notepad
o
in
Windows
Explorer,
go
to
…
\Vircom\Modus<Mail
or
The Origin-IP: entry indicates the IP address the spammer is coming from
•
Block the IP address at the router level (using a NULL route) or physically unplug network
cable of the PC being used to flood your Modus server
•
You can also block the IP address in Security – Properties - Connections
After blocking the spammer, proceed with the following:
•
Stop all Modus services
•
Rename the spool directory to spool.old
•
Start the Modus services and mail flow will begin
•
Perform a Windows Search of the spool.old directory for all messages that contain the
spammer’s IP address
•
Delete or move these messages
5
Using the System Health to Isolate Problems
•
Consult the KB article How-To: Respool Messages to respool messages
•
Note that you should contact Support as these measures are temporary
1.1.2 MODUSADM – Modus Administrator Service
If the MODUSADM service reaches 100% but MODUSCAN is operating normally, there should be a
buildup in the Processing Queue. However, in this case, it is not a buildup under …\spool\invirus
(where messages go before being processed by MODUSCAN) but a buildup under …\spool\spam
and …\spool\virus (where messages go before being sent to Quarantine).
In Windows Explorer, check the …\spool\spam and …\spool\virus folders. Thousands of MSG/RCP
pairs in the sub-folders indicate that MODUSADM is unable to insert them into the Quarantine.
This denotes a Quarantine database problem.
Consult How-to: Prevent Quarantine-Related Problems for instructions to fix this problem.
1.1.3 SMTPDS – SMTP Delivery Service:
If SMTPDS reaches 100% but all other services are behaving normally, there is a delivery problem.
Check in the System Health panel.
1.1.3.1 DNS, Network or Firewall Problem:
A buildup in the Message Delivery Queue but not the Processing Queue indicates that messages are
not being delivered to remote destinations (either external destinations or local routes if using
ModusGate). Check your DNS server by performing an nslookup on the MODUS server. Telnet to
Port 25 of the destination addresses to see if there is a network problem.
1.1.3.2 Problem with Local Authentication or Mailstore:
A buildup in the Processing Queue but not the Delivery Queue indicates that the buildup is located
in the spool\domains\$local$ folder on the mail server.
This suggests that Modus cannot
authenticate users quickly enough to deliver mail to their mailboxes or that the mailbox folders are
on a network share and the share is no longer accessible.
a) Authentication Issues – User Database is Too Slow:
•
If, in the Console, the Users – Properties – General panel is slow to refresh, this
indicates a performance issue with the user database (you are not using Generic
mailboxes)
•
Check the user database for corruption
•
Follow the instructions in section 1.1.3.1
b) Authentication Issues – Unused Authentication Methods Called:
•
If you have many domains configured in Modus, some may be using authentication
methods that are not required
•
E.g. you only use Database and Generic mailboxes but, in the Console, some domains show
6
Using the System Health to Isolate Problems
that LDAP or NT SAM lookups are enabled in Domains – Properties – Authentication
•
Ensure that all domains only use the appropriate authentication methods
c) Mailstore Issue – Local Mailbox Folder is Full:
•
In the System Health, check the System Disk Free reading to determine if there is a disk
space problem
•
Check the disk space on the PC that houses the mailbox folder
d) Mailstore Issue – Remote Folder is no Longer Accessible:
•
If the mailbox folder is located on a shared drive (such as with Modus blockades),
communication to the shared drive has been lost or permissions have changed and are
preventing the Modus services from writing to the existing share
•
Ensure that communication to the share is working correctly
•
Ensure that the SMTPDS service is running under the correct account and has Full control
to the share being used
•
As stated in the Administration Guides, you must log into Windows with the Administrator
account
1.1.4 SMTPRS – SMTP Receiver Service:
If the SMTPRS service reaches 100%, check the Processing Queue. If there is a buildup in the
Processing Queue, there may be a problem with mailbox authentication. There may also be a mail
loop or a spam attack.
1.1.4.1 Slow Authentication with ModusMail:
Please see section 1.1.1
1.1.4.2 Slow Authentication with ModusGate:
The pre-authentication for one of the routed domains may be failing. There may also be a buildup
of messages in the Message Delivery Queue. Check in the …\spool\domains folder one of your
routed domains. If there is a buildup of .RCP, .LCK or .DEF files, this could be causing the buildup.
•
In the Console, go to Connection – Properties – General to check that the route for the
particular domain is properly configured
•
Telnet to Port 25 of the destination mail server you are protecting to check of there are
response problems
1.1.4.3 Spam Attack:
See resolution of item 1.1.1
7
Using the System Health to Isolate Problems
1.2 MEMORY USAGE
High memory usage often indicates a memory leak. Log into the Modus server and go to Task
Manager to check which service is consuming the most memory. If one of the Modus services is
using a large amount of memory, proceed with the following:
•
Use the userdump tool for a snapshot of the memory being used by the service
o
From a Command Prompt, change directory to …\Vircom\Modus<Mail or Gate>
o
Type userdump <servicename.exe> <enter>
ƒ
•
E.g. userdump smtpds.exe <enter>
o
This produces a <username>.dmp file
o
Contact Customer Support to open a ticket and send the .dmp file
Create a batch file to automatically stop and start the service, thus preventing buildups
o
Create the batch file at the root of C:\ and name it restartservice.bat
o
Include the following in the batch file:
@echo off
net stop SMTPDS
net start SMTPDS
o
Use Windows Task Scheduler to schedule the batch file
ƒ
Schedule it to run based on the amount of time it takes for the memory
leak to become critical
1.3 SYSTEM DISK FREE
The System Disk Free status shows disk consumption for all disks, including disk space used by
mailbox, spool and quarantine folders and log files. To ensure that the log files do not use
much of the disk space, in the Console, go to Logs – Properties – File Config. At Log
lifetime, set the number of days after which old log files are deleted. If the value is set to 0,
files are never deleted.
the
too
file
the
1.4 MAILBOX DISK FREE (ModusMail)
If the mailbox folder is on a separate drive and you are running out of disk space ensure that
Modus is properly configured to use quotas to disallow users from using too much disk space. In
the Console, go to System – Properties – Quotas to set system-wide quotas, Domains –
Preferences – Quotas to set domain-wide quotas. Also, consider installing a larger hard drive.
Note that the Quarantine contents folder is usually stored in a special mailbox called @quarantine.
Consider setting lifetime quotas for your quarantine store or deleting viruses from the Quarantine
instead of keeping them.
1.5 SPOOL DISK FREE
Ordinarily, the spool folder should never encounter buildups. The spool folder using more disk
space than it should indicates problems elsewhere. Consult the various sections of this document
that discuss spool buildups.
8
Using the System Health to Isolate Problems
One way to reduce disk space usage by the spool folder is to decrease the interval that Modus uses
to resend mail to the outside world. Go to System – Properties – Mail Delivery to change the
retry schedule.
1.6 QUARANTINE DISK FREE
If the quarantine was moved to a drive other than that that of the mailbox (by changing the
Registry key vsQuarantineDestinationPath), the only remedy is to move it elsewhere or reduce
the time messages are kept in the quarantine.
To move the @quarantine folder:
•
Open the Registry Editor and go to HKEY_LOCAL_MACHINE\Software\Vircom\VOPMAIL
•
Locate and double-click on the vsQuarantineDestinationPath key
•
At Value data enter the new path of the @quarantine folder
•
Exit the Registry Editor
•
Stop the MODUSCAN and MODUSADM services
•
In Windows Explorer, create the new @quarantine folder (as was entered in the Registry)
•
Start the MODUSCAN and MODUSADM services
•
Copy the old @quarantine folder and its sub-directories to the new folder location
9
Using the System Health to Isolate Problems
2) SYSTEM ACTIVITY
2.1 SERVICE STATUS INDICATORS
The service status panel indicates which services are running: green denotes a service that is
started, red denotes one that is stopped (purposely or not) and yellow denotes that the service
stops and starts continually. If an indicator is yellow, place your mouse over it to check how often
the service has been stopped and started.
If a service indicator is red (unintentionally stopped) or yellow it could mean that the service
crashed and generated an exception log. In this situation, compile the following information and
contact Customer Support:
•
Obtain the exception logs
o
In Windows Explorer, go to the root Modus folder and search for a *.EXC file
ƒ
o
•
In the same folder, there should be a general exception log file called vopsexp.log
Obtain the error and operation logs
o
•
There should be one for the service that has crashed/stopped (e.g.
SMTPDS.EXC)
In Windows Explorer, go to the …\Vircom\Modus<Mail or Gate>\Log folder and
search for the ERR*.LOG or OPR*.LOG files that correspond to the time period
covered by your exception log
Zip all of these files and contact Customer Support
o
Support will open a ticket and you will be asked to submit the zipped file for
analysis
10
Using the System Health to Isolate Problems
2.2 INBOUND CONNECTIONS
The Inbound Connections counter measures the open connections on the inbound SMTP port
(usually Port 25).
An unusually large number of inbound connections (click on the graph icon to see a statistical
history comparison) denotes either a spam attack or user authentication is slowing the system by
maintaining open SMTP connections.
a) Spam attack:
Consult section 1.1.1 for more information.
Connection slowdowns:
•
•
•
•
RBL lookups or slow DNS response
o
More information can be obtained by looking at the DNS Response and RBL
Response in the Performance Profile panel
o
If these indicators show unusually high readings, you could be using a slow or dead
RBL or your DNS server is being overwhelmed with requests (i.e. with Reverse DNS
lookups, SPF lookups or sender validations)
RBL responses
o
In the Console, go to Security – Properties – Real-Time Blacklists to remove
one of the RBL severs
o
Stop and start the SMTPRS service and verify the Inbound Connections
o
If there is still a buildup, repeat the above steps until the problem RBL server has
been isolated
o
Remember to add the RBL servers that were removed
SPF / Validate Sender Address / Reverse DNS lookup slowdowns
o
Disable each of these features (one at a time) to determine which is causing a DNS
load
o
Stop and start the SMTPRS service each time
o
These features are located in the Console under Security – Properties – Sender
Validation & Accreditation
Mailbox authentication causes connections to stay open too long
o
Consult 1.1.3.2 for more information
2.3 OUTBOUND CONNECTIONS
The Outbound Connections counter displays the number of open connections between Modus and
destination mail servers. SMTPDS is the service that opens connections for message delivery.
An unusually large number of outbound connections (click on the graph icon to see a statistical
history comparison) could denote the following:
•
Messages are being delivered to a large mailing list
o
If messages are being delivered to a large mailing list, a high number in Outbound
11
Using the System Health to Isolate Problems
Connections is normal
•
•
•
Outbound connections stay open too long
o
This indicates a connectivity problem between the server and the outside world
o
This includes a problem with bandwidth saturation, network collisions or a defective
NIC card
o
Telnet to Port 25 of an outside server
ƒ
If there are problems connecting, there is a network problem
ƒ
Consult the network administrator as this type of problem falls outside the
scope of the mail server itself
There is a spammer actively using your server
o
This should be associated with a spike in CPU usage (SMTPDS and MODUSCAN
reach 100%) and a buildup in both the processing Queue and Delivery Queues
o
Consult section 1.1.1
Connectivity failure
o
This can occur if you use ModusGate and host a domain with a considerable traffic
and the connection between ModusGate and the server being protected is down
o
This can also occur if the pre-authentication is failing
o
Telnet to Port 25 of the protected server to test the connection
2.4 PROCESSING QUEUE
The Processing Queue measures the buildup of messages in all of the spool sub-folders except for
…\spool\holding and …\spool\domains
A buildup in the processing Queue can occur for several reasons. It is recommended to check the
spool folder to identify which spool is building up to determine the problem. The following folders
are included in the Processing Queue measurements:
•
…\spool\invirus: messages that just arrived and are being processed by MODUSCAN
o
•
…\spool\incoming: messages that were processed by MODUSCAN and are awaiting
processing by SMTPDS
o
•
A buildup in …\spool\incoming usually accompanies a buildup in …\spool\domains
and …\spool\holding (the “waiting” folder for messages before they are delivered to
the outside world or local mailboxes)
…\spool\spam: messages that are identified as spam and are waiting to be sent to the
quarantine
o
•
Consult section 1.1.1 for problem resolution
For some reason, Modus cannot insert items into the quarantine and move
messages to the @quarantine mailbox. See point 1.1.2 for resolution.
...\spool\virus: messages that are identified as viruses and are waiting to be sent to the
quarantine
o
For some reason, Modus cannot insert items into the quarantine and move
messages to the @quarantine mailbox. Consult section 1.1.2 for resolution.
12
Using the System Health to Isolate Problems
•
...\spool\domains\$local$:
o
•
A buildup signifies that message delivery to local mailboxes is being delayed. See
point 1.1.3 for troubleshooting.
…\spool\domains\:
o
A large number of sub-folders indicates that there is an active spammer on your
network and your server is being used as a relay
ƒ
o
See section 1.1.1 for troubleshooting
Your DNS server may be down
ƒ
In System Health under Performance Profile, click on the graph icons for
DNS Response and RBL Response
ƒ
Consult section 2.2.2 for resolution
2.5 DELIVERY QUEUES
The Delivery Queue counter indicates the buildup of messages (.MSG files) in the …\spool\holding
folder and the associated .RCP files stored in the …\spool\domains folder.
A buildup in the Delivery Queues is usually tied to a high CPU load caused by SMTPDS.
The buildup will either be in the …\spool\domains\$local$ folder (local mailboxes) or the
…\spool\domains folder will have a large number of sub-folders (remote deliveries or deliveries to
routes in ModusGate). Consult section 2.4.2 for resolution.
13
Using the System Health to Isolate Problems
3) PERFORMANCE PROFILE
The Performance Profile section provides a snapshot of the mail flow, from start to finish. Modus
measures how long messages take to travel through the server (including external systems that
Modus is dependent on) and this response is presented in this panel. This allows for bottle-necking
to be easily detected.
•
•
•
Avg Receive Time
o
Shows how long it takes to receive a message (inbound SMTP traffic)
o
A large spike could indicate a spam attack in progress (See 1.1.1) or problems with
mailbox authentication (See 1.1.4)
Avg Send Time:
o
Shows how long it takes for outbound delivery to be completed
o
A large spike could indicate outbound connectivity issues
o
See section 1.1.3
DNS Response:
o
Shows how long it takes to resolve an address for mail delivery
o
Shows how long it takes for the security measures (i.e. SPF check, Reverse DNS
and Validate Sender Address check) to complete
o
Since these checks go through DNS, a large spam load could possibly put pressure
on your DNS Servers if all of the security features are enabled
o
Should DNS response slow down, disable some of these security features (see
2.2.2)
14
Using the System Health to Isolate Problems
•
•
•
•
•
RBL Response:
o
Shows how long it takes for special DNS queries performed against RBL (Real-Time
Blacklists or DNS Black lists) to respond to the serve
o
If one of the RBL servers takes too long to respond, remove it from the list of RBL
servers. See 2.2.2.
Recipient Validation:
o
Shows the time it takes to check if a local mailbox is valid (ModusMail) or a remote
mailbox (ModusGate).
o
A large spike could indicate a problem with this authentication
o
For ModusMail slow local authentication, see 1.1.3.2a and 1.1.3.2b
o
For ModusGate slow remote authentication, see 1.1.4.2
Monitoring DB Queries:
o
Shows how long it takes to update information in the monitoring database
o
If there are constant spikes (see usage trends for comparison), disable it in
Administrative Tools > Services
o
In the Console, stop and start all of the other services
o
Contact Customer Support to troubleshoot the issue further
Spam Filter:
o
Shows how long it takes for a message to be scanned by the SCA engine and Sieve
filtering system
o
A spike could indicate that a large message is being processed
o
A spike could also indicate that the engine cannot process a particular message
which is causing a backlog
o
Stop the MODUSCAN service
o
Isolate the message in …\spool\spam (the .MSG/.RCP or .LCA pair)
o
Start the MODUSCAN service
o
Zip the isolated files and contact Customer Support so that the files can be
analyzed
Virus Filter:
o
Shows how long it takes for a message to get scanned by the Virus engine
o
A spike could indicate that a large message is being processed
o
A spike could also indicate that the engine cannot process a particular message
which is causing a backlog
o
Stop the MODUSCAN service
o
Isolate the message in …\spool\virus (the .MSG/.RCP or .LCA pair)
o
Start the MODUSCAN service
o
Zip the isolated files and send it to virustrap@vircom.com
15
Using the System Health to Isolate Problems
•
•
Sieve DB Queries:
o
Shows how long it takes to lookup an entry in the SieveStore, where the
Trusted/Blocked Senders Lists are stored
o
Large spikes could indicate a corrupt SieveStore or connectivity problems between
Modus and your SQL server
o
See section 1.1.1
Quarantine DB Queries:
o
Shows how long it takes for Quarantine inserts and lookups in the Quarantine DB\
o
Above average times indicate problems with the Quarantine DB or connectivity
problems between the Modus server and the SQL server (See 1.1.2)
4 - SYSTEM INFO
The System Info panel provides the following status information:
ModusMail
•
The current version and build
•
The license expiry date
•
The mailbox limit
•
The number and percentage of mailboxes used
•
If license expiry date is approaching, contact Customer Support to ensure that you receive
Virus and Spam updates
•
Contact Customer Support if you are also approaching your license limit
16
Using the System Health to Isolate Problems
SCA
•
The SCA version
•
The date and time of the last update and update check
•
Contact Customer Support if there are problems with the updates
Norman / McAfee
•
The version of the AV software used
•
The date and time of the last update and update check
•
Contact Customer Support if there are problems with the updates
The System Info panel also provides Windows OS information and the date and time of the last
server reboot.
17
Download PDF