Over the past seven months, WritingForums.org has experienced bought of forum downtime. I’m writing this notice to keep you appraised of this issue and what we are doing to both maximize uptime and discover the root of the problem. Before I go further, I want to let you know that this issue will be solved, and it’s my number one priority. I’m sure some of you are concerned with these server issues on WF, both publicly and privately, but I’m here to promise diligence in the quest to solve it and to promise my commitment to the longevity of the site. We're not going anywhere. I just ask for your patience and understanding as we investigate this issue. Frequency & Duration The frequency and duration of this has been variable: sometimes no issues for s month, other times problems three times in a week. I’ll get into the cause of this below, but typically once I’m away of the issue I can bring us back online quickly. Downtime only last several hours, of in rare cases a day or two, when it occurs while at my day job or when I’m otherwise unable to fix the problem. Know Sequence of the Problem Essentially what happens is that out MySQL Server - the part of our server that runs and manages all databases - is kicked offline. During this process, an error occurs in our sessions table. The sessions table is not particularly important, or only logs the recent visits of users and guests. Because of its function, it is likely the most active database table, and the most active table is the one most likely to have an error or corruption. No loss of data, beyond which users and guests visits the site, is at risk to our knowledge, aside from what may be posted or edited at the exact time of the crash. As of now this is a problem of inconvenience, not data loss. Once the MySQL Server is restarted, the sessions table in the database must go through a manually-initiated repair process. This is the error you probably see if you can see the sites design but not forum content. Once this table is repaired, the site is back online. What We Are Doing I’ve done research over the past few months reviewing similar errors and opening multiple tickets with our web host. I’m currently in updated communications with our Web Host and Xenforo. Xenforo believes it's a server issue, not anything related to their software. Right now we need to capture and review data regarding what's happening right before the crash, and after the crash before we restart the server. Our investigation has lead to these possible causes, though we still have yet to pinpoint the root of problem: Problems with Memory Allocation (will disk space or disposable memory) Problems / Damaged with the Server Hardware Misconfigured Server / Database Settings Problems with Resource Use Latest Discoveries and Plan by Host During our last Support Ticket two weeks ago, a technician working for our hosting company made some progress. He found some clues on activities based on the logs. However, he said that to move forward we need to observe the issue with MySQL and investigate before our MySQL Server is restarted. This should give more information in the cause of the problem. We also need to investigate to get an idea of what queries are running when the issue starts. I've re-opened the support ticket as these issues seem to be happening more frequently. The next two or three times this happens we may need to stay offline a little longer than usual to allow them to investigate. Right now I’m clarifying what they need from me. The plan is to capture the appropriate logs once the site crashes next time, prior to bringing us back online. If I’m unable to make progress with Xenforo and our Hosting Provider we will look into hiring a profession Server/Database Administrator. If all else fails, we will update our software/reduce dependencies and move to a new web host. If it's a server configuration or server hardware issue, a new hosting solution may solve the problem. I will do my best to keep you appraised. Expect this to reoccur for another week or two until we figure out why its happenings. If you have server admin experience and you want to give your shot at investigating let me know. It’s not a data loss issue. I’m committed to both solving this problem and to the longevity of WritingForums.org. My hope is that we can solve this over the next two weeks and I can return to XF2/WF5 development. Questions and comments welcome. Daniel
Thank you @Daniel. I for one really appreciate all the hard work that goes into keeping the site running.
Yes, absolutely. Without the mods keeping an eye on the ebb and flow of interactions here, we wouldn't have such a good forum. But without Daniel's efforts, we wouldn't have a forum at all. Thank you, Daniel, for keeping it going.
Thank you Daniel! We appreciate the update. I unfortunately am only just beginning my SQL studies, but hopefully another more experienced member might be able to lend a hand.
It's obvious that the real problem is that someone has forgotten to feed the hamster ( whose wheel powers the server) the only answer is a team of coypu to provide extra power when it's needed.
An update: About a week I ago I was working on the site and managed to see a the database/MySQL server crash happening in real time. I managed to capture the logs our web host needed to investigate this issue further. The relevant logs apparently get wiped when the MySQL server restarts, so they needed to capture the logs prior to the server restarting. Check. Using these logs, our hosting provider pinpointed certain configuration settings they believe are responsible. These MySQL connection settings allowed caching of database queries with delayed inserting, which essentially means the forum delays or reuses common queries to the database that are the same to reduce the load on database. This is typically something done to increase site load time forThis functionality is normally a good thing that speeds up a website, but in the case of our Xenforo setup, it caused our database to lock up to due too many attempted connections. We've disabled this type of caching and made other adjustments likely to prevent the issue from recurring. While our server provider seems to think these issues our solved, I'm unsure, but cautiously optimistic. We haven't had a reoccurrence in 10 days or so, which is promising. There's a good probability this issue is resolved. That said, I am continuing to monitor our server, and this issue or a related one could reoccur. At some point we may need a specialized system administrator to optimize our server's software, memory allocation, and database configuration. TL/DR: Our hosting provider used our server logs to pinpoint what they think the issue was, made some configuration changes, and believes they have solved the heart of the issue.
Thanks. I was wondering if it was the first signs of an ownership issue or simply that the site was transitioning to a new format or even closely down! Good to hear it’s serious. I think many find this forum to be a useful resource as well as reasonably open to all without any enforcement of critique (which personally I find to be counterproductive if enforced!)
I was a bit worried about that too. I’ve come to really love this place and it would leave a big hole in my life if it went away.
I'm assuming you're okay with the hardware and resources on your local end? It doesn't seem like it's a problem to throw money at, unless you change hosts? Let me know. I didn't choose to be a Supporter to sit on the sideline and watch. What are the most reliable options/costs?