NFS server slow, disconnecting randomly

NFS server slow, disconnecting randomly

Closed - This job posting has been filled and work has been completed.

Job Description

Summary: We need to find out why our NFS server will randomly, get slow and disconnect NFS and SSH clients.

Our NFS server has come into some strange problems recently:
- Clients get "Stale File Handle" error message often
- SSH connections are sometimes slow and will drop for no apparent reason

Infrastructure Details:
- 4 AWS EC2 instances (All Ubuntu)
- 1 x Varnish server
- 2 x Apache server,
- 1 x NFS server
- Webservers serve content from NFS share
- Varnish server writes backups to NFS share

Steps taken thus far:
- Apache serves "403 Forbidden" message
- "ls -lah" on NFS mount displays "Stale File Handle"
- Restart NFS service on NFS server
- Reboot Apache servers
- Apache -> NFS connection works again ... for awhile.

Notes:
- Used netstat -tuc to monitor for DDOS attack to explain slowness. Only connections were current SSH session and NFS share.
- Problems started occurring recently after 3 months of working with no problems.
- Due to this issue, NFS server has been removed from infrastructure. Currently, only one Apache server connects to it. Mount has not been reestablished on Varnish server.
- We do not have expertise to know what to look for in log files to explain this behavior.

A successful engagement has the contractor finding and explaining the issues listed above. If the fix for these issues is outside of our areas expertise, the contractor will implement that fix.

---
Skills: nfs, apache, varnish