Wednesday, December 17, 2008

Wanting better results

I watched "The Mentalist" last night and one of the interesting sentence was from a researcher that wanted his system to work so much that he was losing his objectivity.

I think that this reality happens too many of us all the time. We lose our scientific mind because we want things to work and be done with it.

We are in the middle of implementing new JVM startup parameters. The optimized version produced good results in staging so we applied to a single instance in production. We never reached conclusive results but it what was pushed to 30 instances and we have restarted 9 to check results. After only 3 hours of real production time it was decided that we should move forward everywhere. The results look good. A co-worker mentioned that it was a bit quick to jump to conclusion and that maybe we should wait for a bit more traffic to go through to see if these new params in production don't have a side effect.

That is when the previous night comment from the scientist hit me.

Management wants to see good results so much that all they can see are the good numbers and they are probably not looking at the rest. I am ready to bet that some numbers are pushed aside because they look like anomalies to the results they want.

No one seems to remember that this is the attitude that have put us in this spot in the first place and that we are using the same technique to get out of hot seat.

History has a tendency to repeat itself.

Tuesday, November 25, 2008

SVN .svn directories

Some bad idea of mine to cleanup the .svn directories.

It started because that application I need to maintain is half in svn and the other half we have to copy manually if we move it. Just a bad bad bad setup.

Checking out an application from svn is easy and works well but if you have files that are not checked in then you need to manually copy them. The best answer would be to check them in from the old location and then check them out from the new location but we don't have a good setup like this. I was moving the application and the svn server at the same time. The previous SVN server was hosted by a third party and we don't have control over it. This means that doing a dump from it is not immediately possible at 5h00 on a Saturday morning. So I manually rsynced files from the old server to the new server and overwrote some .svn directories.

When you overwrite these .svn directories you basically corrupt your new application svn installation. I had directories pointing at the new svn server and others pointing to the old svn server. When you do a "svn up" it complains about this corruption.

To address this issue I simply deleted the .svn that are pointing to the old svn server thinking that I can either re-add them or recheck them out. Does not work like that.

Suddenly I felt my skin changing color. How am I going to fix that one.

Found an idea on Google to simply check out the directory in question from the new svn server into a temp directory and copying the .svn sub-directory over.

THANK YOU Google! (and the person that posted the idea)

I fixed each directory that needed it this way and I was able to update the required code. I also checked in files that had not been added to the new svn server yet.

I feel much better now.

Should I mention that I did not send in my RFC (request for change) for this operation... No one cares about change management?

Friday, November 21, 2008

IPTables again

So I did make changes to IPTables on production servers again.
This time I used my new idea to simply execute the iptable command and not restart the service under RHEL.

This idea produced boring results... Nothing was broken.

I would have won my bet from my previous post but since no one reads this blog I made zero penny from this.

Tuesday, November 11, 2008

IPTable changes and restart

Last week I did some changes to our IPTable to accommodate a new management server. The restart of IPTable on a RHEL system is not without consequences.

We ended up with hanged threads in JBoss and it caused NFS issues on an Oracle server. Serious issues for a small change.

I was not a real cowboy on the whole operation. I submitted and RFC (Request For Change) and it was approved by our change manager and the Director. For some reason that did not calm anyone.

After the incident I had a co-worker point out that we should all know not to do this because we had a previous incident similar enough. Those comments are so helpful that I wish I could get more every day. Really...

If this co-worker would make sure that his documents or emails were factual instead of winded emotional stories we would learn from him instead of avoid all communication.

At least it got me thinking about the issue and I think that I have a better way to handle iptable changes. I need to convince my Director to let me try it. Any bet taker on this one?

Monday, October 13, 2008

When I am on call I create new tools

I guess that some co-workers are finding it funny that when I am on call and that a lot of work will need to be done after hours, I quickly complete code for new tools to automate the work. I tried to explain that the extra time I spent on writing the code would have been spent anyway with manual work. I guess that what caught their attention was more the timing of the extra efforté

The last one was on the cowboy side because testing was lite. The real test was a run on patching and moving about 30 sites. If it had not work properly this would have been quite a disaster to address.

Lucky me this time!

Friday, September 5, 2008

Team consensus are overrated

If you have to wait for every member on your team to agree to do something you will be idle most of the time. You have to read between the lines and get going. I understand that you want to make sure you are not going to break the production network/configuration but it is Friday so let's hurry so we can get things done. Since I am on call this week-end I am telling you it is ok.

Thursday, August 14, 2008

Upgrading switches by mistake

I thought that I was so clever today to realize that I did not pay attention to where I was working.

Clever means that I have 2 switches and that the one that has nothing on it was going to be my test bed for a new IOS version. No risk and I have a console cable to it so if I have serious issues I have a way to recover.

Not so clever is the fact that I did not pay attention and I upgraded the live switch with all the management servers connected to it. Why all the active port did not trigger anything in my mind is a bit odd. I should have noticed a bit earlier since I only look at all these active ports after the upgrade to realize what I had done. So proud of myself.

Now I have to explain the alerts we got.

Procrastination looks like a better option right now!