PDA

View Full Version : QMail - log analysis


dan.cox
12-06-06, 07:17
Hi all,

We're having serious problems with our mailserver and I need help understading the qmail logs to determine the cause of the problem.

It appears we are experiencing an abnormal amount of email activity and the recent maillogs are exceeding 2GB in size!

I am not too linux savvy so i'm struggling to know where to turn but thought i'd first start by trying to understand why these logs are so large. Especially as before we started having problems these logs were only around 60MB in size.

At the moment i am not sure if qmail has its knickers in a twist or whether someone is trying to compromise our server so i need to ascertain some kind of behavioural pattern for our emails.

From what i can see there are a huge number of emails that seem to be destined for the same account, here's an example from the maillog:

Nov 24 09:18:43 web qmail: 1164359923.602551 new msg 184487

Nov 24 09:18:43 web qmail: 1164359923.603869 info msg 184487: bytes 12616 from <vcsyogaindailylifepoh@externaldomain.org> qp 7279 uid 399

Nov 24 09:33:17 web qmail: 1164360797.575379 starting delivery 3349641: msg 184487 to local duckworth-and-kent.com-wildcard@localdomain.com

Nov 24 09:33:18 web qmail: 1164360798.909860 end msg 184487

Now from what i can tell this is a log for an inbound message that is destined for one of our local domains. That's fine. But what's really puzzling me is that the wildcard / catchall feature for this domain was turned off a few days ago so why would qmail be trying to deliver the message to the wildcard address?

Is there anyway i can check whether the catchall is really turned off, from with HSphere it says so but i'm assuming there must be an underlying parameter to correspond to this.


Any help on the matter is going to be really appreciated here guys. This has fallen in my hand and i am by no means the best man for the job but i have customers who are experiencing huge delays on emails and need to get the problem sorted asap!

Dan.

dynamicnet
12-06-06, 07:30
Greetings Dan:

RE: http://www.psoft.net/HSdocumentation/sysadmin/qmail_configuration.html#qmail_settings

Please make sure the following settings are checked (on) in E-Manager, Mail Servers, Configure:

userchk
rcptdnschecks
uquotacheck
smdcheck

To speed up email delivery (presuming there are no DNS or system problems), up concurrencyremote to 250 and concurrencylocal to 125

You may want to go to http://www.dnsreport.com/ and enter the domain name portion of your mail server machine name to see if there are any public DNS problems including MX (mail exchange) record problems.

On the mail server itself, you may want to perform nslookup / dig against well known public sites such as cnn.com, hotmail.com, and so on to make sure local DNS resolution is working as well.

Thank you.

dch
12-06-06, 10:13
Hi Dan,

To check if catchall is really disabled check for /.qmail-default file I think in the folder for that account.

In case it is a dictionary attach against a catchall address maybe try this to find the mail address that is receiving most mail "today"...

cat /var/hsphere/mail/logs/stats | awk -F"|" '{print $3}' | sort | uniq -c | sort -n

Depending on when the file is rotated, that will count the recipients in the h-sphere maillog for log files time period (ours rotate daily at around 3:30am I think)

Cheers,
Sean

dan.cox
12-07-06, 06:50
Hi Sean (and dynamicnet),

Thanks for your replies.

We are running version 2.4.0 of HSphere and all of the options mentioned in dynamicnet's post have been selected. Nslookups and digs resolve fine. MX records are also ok.

Sean, I am unable to run the command you have suggested as the mail directory does not exist under /var/hsphere, I'm assuming this is becuase our version of HSphere is out of date?! Any ideas where it might be?

If i tail the /var/log/message file i am seeing a number of other interesting logs too:

Dec 7 11:56:57 web named[1754]: MAXQUERIES exceeded, possible data loop in resolving (externaldomain.com)

and

Dec 7 11:56:58 web named[1754]: Lame server on 'externaldomain.com' (in 'externaldomain.com'?): [66.218.xx.xxx].53 'yns1.yahoo.com': learnt (A=194.203.xx.xx,NS=194.203.xx.xx)

The 194.203.xx.xx entry in the second log a DNS forwarder server that we use to resove DNS on our network. So does this mean anything to you guys?

Oh and i have established the delay is applicable only to inbound mail, outbound mail works fine.

dch
12-07-06, 07:05
Hi Dan,

We are also currently on 2.4p11 - We are running Centos, not sure if they are in a different location if you are running BSD? (Note : the files should exist somewhere as they are how H-Sphere calculates the mail traffic)

Cheers,
Sean

dan.cox
12-08-06, 03:36
Finally light at the end of a three day long tunnel!!

After tracking down the guilty catchall address (which incidentally should not have been enabled) we were able to delete several hundred thousand messages from the queue.

We have now implemented a couple of changes to the mail server configuration within HSphere and are considering revoking the use of catchall addresses.

Thanks alot for your help on this one lads!

Dan.

dch
12-08-06, 08:14
That's good news :) We have just recently contacted all clients explaining the evils of catchalls etc etc, and asked them to disable them - which surprisingly alot of them are! The next step will be looking to remove the service altogether I think.

Cheers,
Sean

dan.cox
12-08-06, 08:38
Well I know we'll have problems convincing some clients that its for the greater good but I'm sure they surcome to my powers of pursuasion ;)

Thanks again for your help sean

Dan.