SELinux oddities with graphite
We use graphite for storing time-series. We also use SELinux (in RHEL6) in targeted mode for extra protection of web applications.
It turns out there's a little snag when enabling CLUSTER_SERVERS in the graphite application settings when the web application runs in SELinux enforcing context.
Previously we had only one graphite server running the web-app and the carbon stuff. However, since graphite is such a nice tool things the number of stored metrics grew considerably. Fortunately graphite has built in support for scaling out in a flexible way using the CLUSTER_SERVERS directi in the local_setting.py file. It was easy to convert the one node setup to a two tier setup using a front end with only the web app (just google «graphite and cluster»).
At this time strange things started happening.
- Response time from queries to the web-app time was dead slow and unstable
- setrobleshootd started consuming all CPU cycles
- /var/log/messages was spammed with messages like : setroubleshoot: SELinux is preventing httpd from name_bind access on the tcp_socket
- Using sealert -l <alert-id> claimed that httpd tries to name_bind to a lot of different port numbers...
I did not expect graphite to need to name_bind to more ports than the standard web ports (80 / 443) specified in the httpd configuration. Also, when not using the CLUSTER_SERVERS directive, this situation does not occur. Grepping through the code I found this in storage.py
for port in xrange(1025, 65535): try: sock = socket.socket() sock.bind( (host,port) ) sock.close() except socket.error, e: if e.args == errno.EADDRNOTAVAIL: return False else: continue else: return True raise Exception("Failed all attempts at binding to interface %s, last exception was %s" % (host, e))
Seems like the code wants to bind available ports in the range 1025, 65535. Since httpd is not allowed to bind to any other ports than 80 and 443 by defauly, graphite will walk through the whole range failing in the same way on each attempt and generating an alert for each of the 64k - 1024 ports, thus generating a lot of AVC denial activity that consume CPU.
semanage port -a -t http_port_t -p tcp 1025