If you can read this, either the style sheet didn't load or you have an older browser that doesn't support style sheets. Try clearing your browser cache and refreshing the page.

(Government Technology)   Virginia's $2.3 billion IT outsourcing contract with Northrop Grumman is going along swimmingly...no, wait, 485 servers have gone teets up and DMV computers were unusable for a week   (govtech.com) divider line 172
    More: Fail, Northrop Grumman, DMV, Bob McDonnell, sun outage, information systems, information technology, servers  
•       •       •

7390 clicks; posted to Main » on 03 Sep 2010 at 4:02 PM (3 years ago)   |  Favorite    |   share:  Share on Twitter share via Email Share on Facebook   more»



172 Comments   (+0 »)
   

Archived thread

First | « | 1 | 2 | 3 | 4 | » | Last | Show all
 
2010-09-03 05:51:02 PM
Flab: Maybe the SAN that failed knocked off a few key servers (PDCs, DNSes, etc..) and the problem propagated to the other ones.

Which is why you don't put PDCs, DNSes, etc on shared infrastructure.
 
2010-09-03 05:52:27 PM
are they sure, it wasn't one Win 95 machine winning an election?
 
2010-09-03 05:54:37 PM
I'm guessing they laid off the wrong IT tech.
 
2010-09-03 05:57:19 PM
Cubansaltyballs:
Since this is NG we're talking about, they probably connected to 485 servers to an iSCSI SAN built on top of FreeNAS, using Extreme Networks with Spanning-tree instead of EAPS rings as the backbone.


OH MAN did i lol. (CSB ENGAGE) I have personal experience with Extreme Networks and the garbage they produce; where i currently admin (largest hospital in the region), my predecessor had wasted untold dollars and hours replacing the core architecture with Extreme Networks switches. it was awful. You wouldn't believe the shiat that would happen (well, maybe you would, you sound like you know your shiat). 800+ people, IP phones, and those stupid purple switches farking everything up. Luckily Cisco gave us a break on pricing in exchange for all that shiatty Extreme gear.

The only reason we can come up with as to why that guy went with Extreme was that he got some sort of kickback from it. We still have one of those stupid purple switches in the office; it has a railroad spike impaled through it's middle as a warning to any other IT personnel that would make that retarded of a decision. (CSB DISENGAGE)

/what's wrong with FreeNAS? ;)
 
2010-09-03 05:57:34 PM
e5wsf: 500 servers on a SAN is not really a lot considering the use of VMware. A large city agency can hit that number. I'm a SAN Architect/Engineer so I see a lot of it.

Rarely do contractors call the shots for gov agencies. They are typically there for specialty skills to augment the state workforce. As a consultant, our job frequently requires us to take the hit to save the job of a Director/Manager. It's part of the gig.


Maybe it's just my dumb luck, but in my experience "contractor" in IT is about 5% likely to be a bona fide expert in their "expertise area" and 95% likely to be a human paper weight who simply has to not fark up for the duration of their contract term.

Particularly giant contracting middlemen/houses. Full of people who are well certified but lack in the area of practical experience.
 
2010-09-03 05:58:30 PM
GAT_00: Because there's such a big private market for fighter planes?

Exactly, that is a large part of the problem. When government is your major customer, cost minimization is no longer a primary concern.When you go back into the open market you are at a serious disadvantage.

GAT_00: They are a private company. Quit dancing around because you can't stand the idea of a private company screwing up a government outsourced project.

Why would I not be able to stand that? Private companies screw up all the time. Hell, at my job we're been working non-stop for months filling the contracts our competition has lost because of their farkups. In a free economy those companies loose market share as a result and either have to improve or go bankrupt.

Grumman is not under any such obligations. Cost minimization is not top priority. Grumman is a private company in the same way the Federal Reserve is a private company. Or Fannie or Freddie. Without government support they would be bankrupt and flushed from the system.
 
2010-09-03 05:59:15 PM
anti-cripes: Luckily Cisco gave us a break on pricing in exchange for all that shiatty Extreme gear.

I always wonder what happens to that old hardware when those vendor swap-out discounts happen.
 
2010-09-03 06:09:52 PM
Knara: anti-cripes: Luckily Cisco gave us a break on pricing in exchange for all that shiatty Extreme gear.

I always wonder what happens to that old hardware when those vendor swap-out discounts happen.


Recycle? And by recycle, i mean shoot it full of holes.
 
2010-09-03 06:13:12 PM
anti-cripes: Recycle? And by recycle, i mean shoot it full of holes.

Wouldn't surprise me, I guess.
 
2010-09-03 06:38:25 PM
As for a guy whos worked on EMC SANs I can tell you that they are without a doubt garbage. Id rather have reel to reels.
 
2010-09-03 06:39:36 PM
Somebody cue the Northrup guy from Apollo 13 "We didn't build it for THAT".

/guess he isn't gonna keep his job this time, huh.
 
2010-09-03 06:40:12 PM
Jument - It's not all perfect here in Washington.

I was with my daughter at the DMV. at first I was making jokes about the signs on the wall saying it was a felony to threaten the employees.

Four (4) hours later, I was almost a felon. Jail would have been more pleasant. Laziest, group of misfits and leeches I've ever dealt with.
 
2010-09-03 06:48:52 PM
TightHeadProp: As with most things, I don't think outsourcing is as black and white as some of you think.

Where a job requires expensive expertise for a limited time, or resources that are best pooled, it can be to the benefit of all parties to outsource.

Most DOT's outsource for road construction. You think it would be cheaper for a DOT to keep and maintain the equipment and staff year round for a project that might happen once every three years?

Anyway, giant companies like NG are just as bad a bureaucracy as any local government. It's been said multiple times in this thread, but outsourcing to an entity that is just as poorly run and inefficient always makes things worse. It's not a question of private or public.

Also IMO it doesn't make much sense to outsource much of IT because it's a daily task where you need to trust the responsible staff completely and have constant access to them and their managers. We aren't talking about a one time design build contract, IT is day to day operations stuff.

I'm in Virginia and I can tell you that everyone I've met who works for the state loathes this contract. It's sad how much effort goes into avoiding NG staff involvement on projects.

I've seen projects that spend budget on things that are already there but state Project Managers know that using the existing resources would involve getting NG involved which would kill the schedule and any chance of running an efficient project.

VIrginia is over a barrel with virtually no recourse as far as disciplining NG or killing the contract. Taxpayers really got screwed on this deal.


well at least NG is paying for the investigation.
 
2010-09-03 06:56:39 PM
undflickertail: slayer199: I have to admit, when I think of Northrop-Grumman I don't think of an IT Services Provider. I think of jets.

Same here, I did some work for them but it was for a IT project where they were basically the contract middleman and only there to take a slice off the top. They were like level 2 out of 3 levels the money went from the company to me. That layer was only a name to me, I never actually interacted with anyone just a payroll time server somewhere they indirectly ran.


They do a lot of work for the USPS, and they do it primarily with contractors. They also don't pay shiat, so I imagine the talent pool is rather thin.
 
2010-09-03 06:56:53 PM
bravian: Unless you build DR into the process since the beginning - its incredibly hard to go back later and retrofit it.

We had the same problem. Facility DR plan. A huge plan with binders full of details. Rather than focus on an IT disaster they made it facility wide. I left during the final steps and it later died.
 
2010-09-03 07:05:45 PM
castufari: bravian: Unless you build DR into the process since the beginning - its incredibly hard to go back later and retrofit it.

We had the same problem. Facility DR plan. A huge plan with binders full of details. Rather than focus on an IT disaster they made it facility wide. I left during the final steps and it later died.


The first problem is calling it DR. I haven't heard that term in at least 10 years.

DR implies a holocaust scenario which is impossible to fully plan for.

Business Continuance implies a bump in the road, with another lane available.
 
2010-09-03 07:06:55 PM
wambu: This is courtesy of Northrup-Grumman. 5magine what the US defense dept.'s IT is like.

Don't dream it, be it:

I am a both a user (one weekend a month and two weeks a year) and a contractor, and even I think it is a mess (new window)
 
2010-09-03 07:07:06 PM
bravian:

Unless you build DR into the process since the beginning - its incredibly hard to go back later and retrofit it.


Yes, usually you don't retrofit DR, you include DR in your next refresh cycle. I am guessing a refresh cycle is in their future...
 
2010-09-03 07:11:55 PM
I had to go to DMV last week when the computers were down. It was unusually empty. Managed to have a good laugh while I was there too. One of the "customer service/courtesy desk" people waited a good 20 minutes before telling people standing in line to see her if it was for transactions (a,b,c,d) affected by the outage. (Of course) All 675 people in line had to reshuffle to her station and start the wait all over again- 30 minutes before closing time.

/in & out in 20.
 
2010-09-03 07:13:32 PM
Knara:
Maybe it's just my dumb luck, but in my experience "contractor" in IT is about 5% likely to be a bona fide expert in their "expertise area" and 95% likely to be a human paper weight who simply has to not fark up for the duration of their contract term.

Particularly giant contracting middlemen/houses. Full of people who are well certified but lack in the area of practical experience.


I've seen some of that but not in the "project based" implementation role. There is little room for error or overruns so hacks generally don't live long. No long term IT support stuff. Storage arrays rarely just "fail". It's typically an operator issue.
 
2010-09-03 07:17:39 PM
TightHeadProp:

I don't know if it's rare or not, but I do know that NG was originally handed ALL of Virginia's IT work. Pretty unbelievable.

Shortly after they got the contract, and everything started to fall apart, individual agencies started screaming loudly to retain some of their own IT. When you think about all of the different needs and diverse missions in a state government, it's all the more baffling that this ever happened.

What many agencies have now is their own IT who does their best to keep everything working while dealing with NG, who is often seen as an adversary. As I understand it, what Virginia agencies have now in many cases are a local trusted IT combined with the NG bad guys, also known as 'VITA'.

I don't have the numbers, but it would seem as if Virginia still has plenty of IT staff on the payroll as well as the NG contract. Way to go!

NG staff do make the big decisions in this case though, or as you put it, they call the shots.

There has been gobs written on this in the local papers, comments on the articles are always a sad/hilarious mix with many state workers chiming in with horror stories.

Although, I have to say, in large companies I've either worked for or know of, IT often has an adversarial relationship with employees. We have our own IT department (company of about 4000) and in general they assume the lowest common denominator and try to apply the same practices across all employees even though we have everything from electrical engineers to construction workers on the payroll.

Nothing against construction workers of course, just that staff that uses their computer to check email and fill timesheets may have different IT needs than an engineer trying to write a Perl script to get something done.


Complete outsourcing is a terrible idea unless the situation is hopeless. Rolling over a state network to new players sounds like a ridiculously difficult operation. Some sales guy if living it up on that commission.
 
2010-09-03 07:22:03 PM
anti-cripes: OH MAN did i lol. (CSB ENGAGE) I have personal experience with Extreme Networks and the garbage they produce; where i currently admin (largest hospital in the region), my predecessor had wasted untold dollars and hours replacing the core architecture with Extreme Networks switches. it was awful. You wouldn't believe the shiat that would happen (well, maybe you would, you sound like you know your shiat). 800+ people, IP phones, and those stupid purple switches farking everything up. Luckily Cisco gave us a break on pricing in exchange for all that shiatty Extreme gear.

The only reason we can come up with as to why that guy went with Extreme was that he got some sort of kickback from it. We still have one of those stupid purple switches in the office; it has a railroad spike impaled through it's middle as a warning to any other IT personnel that would make that retarded of a decision. (CSB DISENGAGE)

/what's wrong with FreeNAS? ;)


FreeNAS is awesome. For enterprise storage it's a joke.

Extreme Networks are awful. For awhile Enterasys was giving them a run for the money, but nothing quite compares to Extreme. Wynn Resorts uses Extreme Networks exclusively. It's pretty frightening that such a large company got swindled like that.

If you're not using Cisco networks, Juniper is the only valid alternative. If you're using anything else, get your resume ready because one day it's going to explode on you.
 
2010-09-03 07:26:57 PM
Knara: anti-cripes: Luckily Cisco gave us a break on pricing in exchange for all that shiatty Extreme gear.

I always wonder what happens to that old hardware when those vendor swap-out discounts happen.


Cisco has a program called CTMP. When the old sh*t arrives at one of their depots, they toss it in a wood chipper and sell the parts for their scrap metal.

I'll take video next time I get invited to the party. It really is a party btw. A lot of ex-competitor's employees working at Cisco flipping the bird and screaming profanity as their purple switches get turned into mulch. It's definitely good for morale.

Nothing like watching equip from a company that bled you dry get turned into silicon paste :)

Yes, you've been cool storied, Bro.
 
2010-09-03 08:02:26 PM
TightHeadProp: e5wsf:

Rarely do contractors call the shots for gov agencies. They are typically there for specialty skills to augment the state workforce. As a consultant, our job frequently requires us to take the hit to save the job of a Director/Manager. It's part of the gig.

I don't know if it's rare or not, but I do know that NG was originally handed ALL of Virginia's IT work. Pretty unbelievable.

Shortly after they got the contract, and everything started to fall apart, individual agencies started screaming loudly to retain some of their own IT. When you think about all of the different needs and diverse missions in a state government, it's all the more baffling that this ever happened.

What many agencies have now is their own IT who does their best to keep everything working while dealing with NG, who is often seen as an adversary. As I understand it, what Virginia agencies have now in many cases are a local trusted IT combined with the NG bad guys, also known as 'VITA'.

I don't have the numbers, but it would seem as if Virginia still has plenty of IT staff on the payroll as well as the NG contract. Way to go!

NG staff do make the big decisions in this case though, or as you put it, they call the shots.


I work for a sub-agency of the Arkansas state government, much of the IT is Northrup Grumman, and I'm one my facility's in-house IT techs. And yes, it's a pain to get what our users want (or need) done by begging NG folks to allow it (since they control the firewalls, servers, user accounts, Windows system policies, general IT security & standards, etc etc).

The nice part of my job is: when resolving an issue is out of my control, I can blame the specific individual at NG who's responsible for holding up/denying user requests. I get to deflect all the user anger to where it (rightfully) belongs. Subsequently, users are always happy to see me come work on their computer - if I'm there, the likelihood of the problem being resolved shortly is high (since I can only fix what I have authority to). And then at 4:00PM, I go home and leave my work at the office.
 
2010-09-03 08:11:07 PM
slayer199: That's pretty pathetic. Don't they have a DR plan? WTF?

Almost every company I've worked for that outsourced ends up regretting it.


As a guy running an IT outsourcing company, I have to say that this isn't entirely fair. The problem was one of these three things:

1) Failure to adequately express the risk of downtime (unlikely. DMV should be like retail POS. If it goes down, you're no longer in business, period).

2) Failure to scope network to meet that risk (possible but unlikely). If you can express risk as 'must be there, always', then the right answer is redundant, redundant, redundant. There ought never be a question in that regard.

3) System wasn't deployed to meet best practises (likely). What got in the way? Possibly budget. Maybe somebody not suited to make the decision deciding that a single point of failure was 'acceptable risk'. Maybe it was deployed incorrectly. Maybe there was a failure to properly test the DR and BCP. These things happen all the time.

4) 'Unforseeable circumstance'. Possible. This is what EMC claims, so someone must be pointing fingers at them. But it's unlikely that it was a hardware problem, since that shiat should have been redundant out the arse. We're probably talking about a combination of configuration issue and testing issue, which would have caught the configuration issue.


My position: If there's an outsourcing problem, it's almost always champagne tastes and a beer budget. I'm more than familiar with a client engagement that goes like this:

ORGANIZATION: We're gonna be digging a hole. We'd like to hire you guys to do it.

US: Sounds to us like you're gonna have a whole lot of dirt to move. I'll find out exactly how much dirt, how far, and how quickly it needs to happen. Then I'll figure out the best dump truck for your job, get you someone that knows how to drive it, and schedule a time for the front end loader and dump truck to show up.

ORGANIZATION: Dump truck? We can't afford a dump truck, not even a small one. Isn't that a little excessive? It's just dirt. Couldn't we move it with a pickup?

US: Well, you could, but then the front end loader would bang the hell out of the truck loading it. We wouldn't be able to use the loader.

COMPANY: Well, can't we use something other than the front end loader?

US: Well, I suppose. If you really wanted to, you could get 50 Mexicans with shovels to load the dirt into that pickup truck. But that would take a lot longer, so would be about the same price as the dump truck in the long run, since not all 50 can put dirt in the truck at the same time, so you'll need more trucks.

COMPANY: That's no good, then. It needs to be cheaper. Since we have the Mexicans, and we need more trucks, how about we don't use trucks and go with something cheaper? My brother in law just told me about a sale on '79 ford Pintos down the road. Wouldn't those work?

US: I suppose. It'll be cheaper up front, but cost you a lot more in maintenance and support down the road, since the Pintos aren't designed to be moving dirt and will break down often.

COMPANY: Cheaper up front? That's all we needed to hear. Let's do that.

*************

I wouldn't be so quick to judge NG. There are *lots* of projects I've been involved in where we did great work but I still hope that people don't think was my idea. Failure in geek-to-english translation? Hell ya. That shiat happens all the time.



Flab: slayer199: Don't they have a DR plan?

Probably not for 485 servers out of 4800. I'm sure the mainframes and a dozen or so servers are covered by a DR plan. And I'm also pretty sure that the DR plan in question forgets a pile of very essential things that make it unusable in a real life scenario.


If they didn't have a DR policy that reflected being on 99.999% during office hours, then they should be fired. That's the *first* question you ask. And even if it was a 0.001% failure, then recovery should have been *much* quicker. This was a failure to plan or to budget/execute, period.

Cubansaltyballs: Since this is NG we're talking about, they probably connected to 485 servers to an iSCSI SAN built on top of FreeNAS, using Extreme Networks with Spanning-tree instead of EAPS rings as the backbone.

How did I come to that conclusion? Simple... I thought of the worst way possible you could connect 485 servers to a storage platform, and determined that was the "Northrup Grumman Preferred Architecture".


Ok, that was nerd funny. There's only a handful of us on Fark that would get it (although more than I would have thought by the uncommonly informed commenting in this thread). Well played.
 
2010-09-03 08:13:06 PM
As someone who is getting a kick of these replies, I would like to say that NG is the company with the best internal IT support of any company I've worked for.

...which is pretty sad.
 
2010-09-03 08:14:59 PM
StreetlightInTheGhetto: Isn't AVG only free when it's for non-business purposes? I know that's true for Avast...

Yup. I tried to speak up, but that's when I realized the futility of dealing with bad management. This is also the same loser who wanted DeepFreeze installed on all the computers (which should have been done in the first place), but also wanted me to program something to retain files on the hard drive. When I told him what DeepFreeze does, he told me what he wanted again exactly and told me if I wanted to keep my job I would do that. Over and over again. For months on end.

Oddly enough, right after he got his ass canned, he went to work with some shiatty distributor that he bought all of the stuff for the job from. Considering he wanted a job that would pay six-figures for doing what he was doing--as if--going to sales and commission work seemed almost silly.
 
2010-09-03 08:27:33 PM
georgehwbush: Jument - It's not all perfect here in Washington.

I was with my daughter at the DMV. at first I was making jokes about the signs on the wall saying it was a felony to threaten the employees.

Four (4) hours later, I was almost a felon. Jail would have been more pleasant. Laziest, group of misfits and leeches I've ever dealt with.


Sounds crazy. As I said I haven't needed to go in years. I imagine eventually they'll tell me I have to update my photo or something and then we'll see how it goes. I imagine I will try to find some podunk little office on the eastside at the most off-peak hour I can manage.
 
2010-09-03 08:30:39 PM
unyon: I wouldn't be so quick to judge NG. There are *lots* of projects I've been involved in where we did great work but I still hope that people don't think was my idea. Failure in geek-to-english translation? Hell ya. That shiat happens all the time.

Sorry dude. I have no sympathy. You don't have to take every customer that comes your way. Sometimes people are irrational and demand more than they can afford. Those are the people that will destroy your business and bleed you dry.
 
2010-09-03 08:52:59 PM
Cubansaltyballs: Flab: slayer199: Don't they have a DR plan?

Probalby not for 485 servers out of 4800. I'm sure the mainframes and a dozen or so servers are covered by a DR plan. And I'm also pretty sure that the DR plan in question forgets a pile of very essential things that make it unusable in a real life scenario.

[CSB time]
One of my customers' DR plan relied on reprogramming the PBX to forward all calls to the DR location. During the 2003 power outage that affected Ontario and the NE US, they quickly found out that this only works if the PBX still has power. It also helps, when you decide to outsource your call center management, to ensure that the company you outsource to has generators!
[/CSB]

The article mentioned a SAN failure.

For that many servers to be fubar at the same time it would have to be a SAN fabric failure. You're typically not going to connect 485 servers to ONE SAN... unless it is one helluva SAN. If it is one bigass SAN, it would have that redundancy built-in.

Since this is NG we're talking about, they probably connected to 485 servers to an iSCSI SAN built on top of FreeNAS, using Extreme Networks with Spanning-tree instead of EAPS rings as the backbone.

How did I come to that conclusion? Simple... I thought of the worst way possible you could connect 485 servers to a storage platform, and determined that was the "Northrup Grumman Preferred Architecture".


Sounds like the servers were virtualized to me and it was the SAN for the VMWare or whatever was hosed.
 
2010-09-03 08:59:41 PM
ChadManMn: The first problem is calling it DR. I haven't heard that term in at least 10 years.

DR implies a holocaust scenario which is impossible to fully plan for.

Business Continuance implies a bump in the road, with another lane available.


On top of the one mentioned earlier, I've seen DR Business continuity plans that failed to take into account transportation to the DR facility. They had always scheduled rehearsals, where they would tell the evening shift to show up at the facility, so they never had problems. They never anticipated that moving 300 people from the main office to the DR site at 11:15am wasn't really easy when the bus only goes by every 45 minutes, instead of every 10 minutes, like it does at rush hour, and that calling taxis for 300 people is not easily arranged.

I've seen DR plans that relied on a shared DR facility, and the one time they needed it, the shared facility was already in use by someone else.

I've seen DR plans flawlessly executed only to have the users refuse to work because the DR facility only had 15" monitors, instead of 17" ones.

I've seen network designs that had redundant everything, up to circuits going to two different COs, except they both left the building in the same underground pipe. Guess where the excavator started to dig?

I've seen attack ships on fire off the shoulder of Orion

And the list goes on.
 
2010-09-03 09:00:01 PM
Killer Cars: DrewCurtisJr: Sounds like they need to deploy in the cloud and virtualization and other jargon.

And they should backup their files using Mozy


2.bp.blogspot.com

Shouldn't she be holding a Windows machine to go with that tee-shirt?
 
2010-09-03 09:20:12 PM
Zasteva: Killer Cars: DrewCurtisJr: Sounds like they need to deploy in the cloud and virtualization and other jargon.

And they should backup their files using Mozy



Shouldn't she be holding a Windows machine to go with that tee-shirt?


It's retarded, but it makes more sense than Back the /dev/sda1 UP!!

Or whatever fruity iDildo symbol Apple uses.
 
2010-09-03 09:50:28 PM
unyon: I wouldn't be so quick to judge NG. There are *lots* of projects I've been involved in where we did great work but I still hope that people don't think was my idea. Failure in geek-to-english translation? Hell ya. That shiat happens all the time.

I've been in the industry for 14 years and I've been a contractor and a direct hire employee. I've worked for an IT Services company (outsourced IT) and typically the problem is that the outsourcing company sells basic services (money loser) and sells add-ons for profit. The bottom line is that NG is MORE at fault here for not being up-front and saying, "No, we can't do it that way because..." and laying out the risks.

Virginia is not blameless in this mess. They didn't write the contract properly or budget properly. Either way it's a total fark up. Every IT Services company I worked for had serious penalties in their contract for any outage more than 4 hours along with a 99.99% uptime.

e5wsf: 500 servers on a SAN is not really a lot considering the use of VMware. A large city agency can hit that number. I'm a SAN Architect/Engineer so I see a lot of it.

Rarely do contractors call the shots for gov agencies. They are typically there for specialty skills to augment the state workforce. As a consultant, our job frequently requires us to take the hit to save the job of a Director/Manager. It's part of the gig.


I've been a VMware Admin for 5 years now. I've worked with EMC, NetApp, and HP-EVA storage (UGH). The only time we've had a total SAN storage failure was during a change control window when a microcode update on a NetApp array failed. Simple solution was to fall back and the storage came back up. We lost one VM because none of the VMs went down gracefully. That VM was back up in 2 hours (built from template and restored from backup). Everything was back online at the close of the Window.

My current environment has 1200 VMs (mostly Windows) on EMC storage. Of course, we have redundant fibre paths, redundant HBAs per host, and redundant storage devices. And yes, we have a DR plan since if our crap goes down for any extended period of time, we're out-of-business (I think the longest outage we can tolerate by contract with our clients is 24 hours with 99.99% uptime overall). SRM has really sped up the process...last time we tested, we were back up in an hour.


Benjimin_Dover: Sounds like the servers were virtualized to me and it was the SAN for the VMWare or whatever was hosed.

Probably, but no excuse for that length of downtime.
 
2010-09-03 10:11:23 PM
slayer199: 24 hours with 99.99% uptime overall)

Monthly or yearly? For each VM or for the whole raised floor? 99.99% monthly means 8 min 42 sec of downtime. Can your VMs restart that fast?
 
2010-09-03 10:24:31 PM
Cubansaltyballs: For that many servers to be fubar at the same time it would have to be a SAN fabric failure. You're typically not going to connect 485 servers to ONE SAN... unless it is one helluva SAN. If it is one bigass SAN, it would have that redundancy built-in.

It's peanuts, really. Don't get confused with where the LUNs are coming from over the fabric. You could have presented LUNs from several arrays on the fabric all combined into a volume group or filesystem like vmfs3 where hundreds of vm's are stored. Also, it only takes a couple terabytes to store that many vm's if they are mainly the front end and middle tier machines.

All it takes is for the datastore to be corrupted (seen it before, having a tard employee run an automated scritp on a san attached server which formatted the first lun it sees which happened to be part of a vmware datastore and not the boot disk.)

Some sort of corruption could have wiped a large portion of their servers running off the whole datastore.
 
2010-09-03 11:03:14 PM
Lose a cache card, corrupt a bunch of luns in a frame, spin tape for a week to get back in business.

/go EMC!
 
2010-09-03 11:07:40 PM
farkinawsome Quote 2010-09-03 05:12:03 PM
ah, lets hire a "new" gov't contractor to find out what the "old" contractor messed up.

/ this is how congress works.

>>>>>

First Congress will call a hearing, then create a committee, and release a report. Then after a year they will hire NG back to fix their mess. At twice the cost.
 
2010-09-03 11:14:58 PM
slayer199:
I have to admit, when I think of Northrop-Grumman I don't think of an IT Services Provider. I think of jets. Still, a good IT policy is to eliminate single-points of failure...and I'm just as blown away by the SAN outage as I am that it lasted AN ENTIRE WEEK? How the hell do you have a SAN failure that lasts an entire week?


Being awarded a government IT contract has nothing to do with a company's expertise in IT. It's all about the company's expertise in greasing the right palms. Since defense contracts have the biggest budgets, defense contractors have the most expertise here.

The antediluvian software that routes emails you send to your congress-critter was written by Lockheed Martin. Really.
 
2010-09-03 11:19:59 PM
ParaHandy: Being awarded a government IT contract has nothing to do with a company's expertise in IT. It's all about the company's expertise in greasing the right palms.

Damn skippy. All it took was hookers and blow to get a 20 million dollar contract with the city of Detroit for their sewage treatment plant.

/ wish I could say I was joking.
 
2010-09-03 11:28:51 PM
LT L Quote 2010-09-03 07:06:55 PM
wambu: This is courtesy of Northrup-Grumman. 5magine what the US defense dept.'s IT is like.

Don't dream it, be it:

I am a both a user (one weekend a month and two weeks a year) and a contractor, and even I think it is a mess (new window)

>>>

Are you the Capt. Captain Scott Weller the article talks about? How badass are you that you get to be called captain twice?
 
2010-09-03 11:54:52 PM
How was it more expensive to do business before computers came along?
 
2010-09-03 11:57:20 PM
timujin: ...I've seen the same level of incompetence from Lockheed Martin's IT consultants.

I worked at a government contractor until recently (I think it's been long enough to tell this story now). We had a contract that had been rode hard over the years -- maybe a handful of us actually solved problems instead of causing worse ones. Came the day we lost that contract renewal to Lockheed Martin, and we started having to deal with some real winners.

This only got worse when Lockheed started hiring out of our company, because LMCO lowballed all the offers to start and even managed to fark over some people who had offers from a sub to LMCO, so the only people who accepted were those who really, really didn't want to move (one guy), couldn't take the risk of not having a job at all (one guy), or were too farking incompetent to get a real job elsewhere (pretty much everybody else).

Anybody with any technical clue and a sense of how much they were worth left for other companies entirely and never looked back. So I can easily imagine that every time LMCO hires a new round of staff, the average quality drops.
 
2010-09-04 12:06:23 AM
callmeox: Lose a cache card, corrupt a bunch of luns in a frame, spin tape for a week to get back in business.

/go EMC!


USP V.
Extreme reliability.
 
2010-09-04 12:15:53 AM
Oh that Mark Warner and his silly ideas about outsourcing! That guy is some piece of work. Even you average everyday farker knows better than him when it comes to outsourcing IT.
 
2010-09-04 12:23:20 AM
missiv: are they sure, it wasn't one Win 95 machine winning an election?

I hereby award you one bonus Internet.

/anybody else watched a network admin lose her shiat after Samba cheated and won the browser election for the umptieth time?
 
2010-09-04 12:35:01 AM
ChadManMn: The first problem is calling it DR. I haven't heard that term in at least 10 years.

DR implies a holocaust scenario which is impossible to fully plan for.

Business Continuance implies a bump in the road, with another lane available.


I assure you that DR as a term is in current usage in all kinds of places, and does in fact typically refer to a facility- or site-destroying disaster. I'm talking about plans that literally do anticipate things like "Terrorists attacked our main datacenter and managed to blow all the power feeds, spiked the generator and knocked a hole in the wall so our delicate servers are exposed to the -20 blizzard with winds gusting to 70 MPH outside."

More to the point, who gives a fark how old the term is? It's the same activity it was ten years ago, why invent new terminology?
 
2010-09-04 12:44:22 AM
Flab: slayer199: 24 hours with 99.99% uptime overall)

Monthly or yearly? For each VM or for the whole raised floor? 99.99% monthly means 8 min 42 sec of downtime. Can your VMs restart that fast?


Assuming he's not way over capacity in his infrastructure and they're not taking unusually long times to boot because of a problem with the OS installation, they should take a great deal less than 1 minute to come up.
 
2010-09-04 12:44:31 AM
Hey, motherf*cker, we're Northrop Grumman, so you just STFU and keep sending us money!

If I had a nickel of every dollar that was spent on a once quality name brand before the business world was populated with thieving, lazy, clueless jerkoffs, I could buy Sweden.

/servers down
//please send more linguistically gymnastic corporate malarkey.
 
2010-09-04 12:49:46 AM
Flab: slayer199: 24 hours with 99.99% uptime overall)

Monthly or yearly? For each VM or for the whole raised floor? 99.99% monthly means 8 min 42 sec of downtime. Can your VMs restart that fast?


Quarterly...and we have a change window between 1-6am every night. VMs restart VERY quickly.
 
Displayed 50 of 172 comments

First | « | 1 | 2 | 3 | 4 | » | Last | Show all



This thread is archived, and closed to new comments.

Continue Farking
Submit a Link »






Report