Log in

View Full Version : Ultra reliable computer.


Lundmark
2008-12-22, 05:48
I have become interested in making a computer that is resistant to the normal causes of service failure and downtime.
most common sources of downtime:
Software: A process runs off consuming all processing power or system memory, or the computer becomes unresponsive for some other reason.
Hardware: Dust clogs heatsinks causing the components to heat up until failure. RAM goes bad. Harddrive bearings wear out or the hard drive controller board fries. CPU fan bearings wear out. Capacitors burst or leak. Solder points fracture
Power: power going into the computer is cut or surges. Power supply components fail.

Here is what I propose:
A hermetically sealed computer case made of solid 1/4 steel on the motherboard base, and 1 mm steel anywhere else. The motherboard will lay horizontally to remove the stress of the weight on the heatsink on the motherboard.
If the computer hardware doesnt have a problem, the case will be filled with an inert gas to prevent corrosion. There will be no fans inside the case. Since there is still a need for air circulation in the case, an ion fan would be used to circulate air to the device used to exchange heat between the inside of the case and the outside, if the risk of electrostatic damage to the components can be eliminated.
A peltier element would be ideal for removing heat from the inside of the case, but I think those are prone to failure. If heat is an issue an element can be added to the outside of the case.
I think a pico itx or nano itx motherboard would be best for this. These boards have very low power consumption, which means they will run longer on backup power, and they will produce less heat. They physically weigh less so they will be less susceptibal to stress related to vibration.
I cant decide what type of physical storage would be best for this type of computer. I have an old quantum fireball from '97 that still works fine despite a few years of abuse early in its life related to a windows 98 page file. Magnetic hard drives are tried and tested to perform in the crap conditions put on them by 14 year old girls with file sharing programs, but their life span cant be judged by how the drive was used. Google did a study on the failure rate of hard drives and iirc it didnt seem to matter much if the drives were thashing back and forth or idle, if the drive survived the first six months of operation, they would run for several years or more. So hard drives are sturdy devices, but they also are not. Hard drive failures are probably the third most common point of failure in my experience. The thing has moving parts, and moving parts fail eventually. My next idea was to use a solid state drive made up of a good quality usb flash drive, but I dont know how reliable a flash drive would be used like this. The technology is still not proven to be reliable. They deteriorate according to how many times you write data to them for damns sake. Is it possible to have a raid array of usb drives? Does running an os mean the disk will have many write cycles? I also heard it is possible to mount an image of the os on a ram disk, and boot from that. I dont know how it was done, but that sounds much more reliable than solid state drives or a regular old platter disk.
Now for what the fuck software to run. I googled "longest uptime" and came to the netcraft site showing FreeBSD has the top spot on the list. Another site showed OpenVMS (which is closed source) as having the highest up time. Then I found a post on message board by a guy saying bsd dominates the netcraft site because the people running it dont have a reason to reboot, unlike linux users who are upgrading and patching to suit their needs, so there is nothing inherently stable about bsd. Most operating systems including windows variants would not crash if they were running on good hardware is another thing I read today. Any thoughts on what the best operating system to use would be?


Related story:
http://www.theregister.co.uk/2001/04/12/missing_novell_server_discovered_after/

hazmat
2008-12-22, 15:38
Intresting idea. Those new Atom based micro/pico atx boards could be a good starting point. You could probably get away with passive cooling, too. If you want a 1/4" thick mobo backplate, might as well turn it into a big heatsink - theres thermally conductive moldable paste stuff that would work for this too. If your CPU heatsink was somehow integrated into the case this would also help the passive cooling effort. Just my thoughts.

Lundmark
2008-12-26, 05:26
Few questions:
Does anyone know the lifespan of thermal grease or pads? I think the best thing to do in this situation would be to lap the heatsinks and leave any thermal transfer compound off. I dont want a crust forming on the cpu as I have seen with my arctic silver and a few years of time.
There are debates raging on the internet that say the Atom boards out now consume as much power and an underclocked desktop computer, but the Atom is far less capable. This information is a stick in my spokes for now. Any info on underclocked desktops vs atom or nano platforms in terms of power consumption?
I have a few ideas in my head on where a computer that can run for 10 years straight would be useful, but I would like to get some other viewpoints. Anyone know where a very reliable pc would be well suited?
Intresting idea. Those new Atom based micro/pico atx boards could be a good starting point. You could probably get away with passive cooling, too. If you want a 1/4" thick mobo backplate, might as well turn it into a big heatsink - theres thermally conductive moldable paste stuff that would work for this too. If your CPU heatsink was somehow integrated into the case this would also help the passive cooling effort. Just my thoughts.
I thought about that too, and it seems like a good idea. I dont think it would be very hard to have a piece of aluminum milled to fit right on the critical parts. I have seen the fools (http://hackedgadgets.com/2008/10/01/computer-with-no-cooling-fans/) who cant stand to have one whisper quiet fan. Passive heatsinks have to be huge, so it might as well be part of the case.

cooldarkknick
2008-12-26, 08:38
This sounds like a fun project!

You could get a few heat pipes from the critical parts, to the case. Then put a small fan near the case.

hazmat
2008-12-27, 01:12
Ive also about heard the power issues with the new Atom stuff, and i think some of it is because a lot of the current boards arent even using the chipset Intel designed for Atom. Some mfgs are using a desktop/laptop (965 maybe?) chipset that is compatable with atom but is cheaper or better on features or something. nVidia is working on a platform for atom (ion) that promises a better balance than the current chipsets being used with atom.

Going with intels current slowest 45nm part, undervolting and underclocking it would definately still compete with an atom, making this a tough decision. Investigating stuff like the die sizes and mfg process of the chipsets may also help to choose the coolest part.

I dont know what id do with this though, if its got a couple SSDs raid-1 then id use it for critical storage. You could use it as your dead mans switch, set up a chron job to send out an email if you dont send your "im still alive" command every so often.

EDIT: If you could get all this to run off a laptop power brick or the like you could eliminate the need for a standard PSU with a fan.

Zip118
2008-12-27, 02:02
What you need, my friend, is a RAD750 (http://www.baesystems.com/ProductsServices/bae_prod_s2_rad750.html).

Realistically though, developing a system to survive every possible contingency over, say, 10 years, is not a trivial thing to do. Most of the reliability figures you see with industrial hardware are based primarily on strict revision control. Even with total quality management methods such as Six Sigma, though they may look good on paper, you are applying these metrics to the complete system so the probability of component failure is still quite high over a long period of time.

This isn't something you can just work out on paper, it requires significant testing which leads to a field called reliability engineering. A system as sufficiently complex as your computer can't be analyzed with something as basic as a fault tree, it requires a high level of redundancy. The flight control systems and actuators on your typical commercial airliner are generally redundant three or four times over, because that's what our Weibull models tell us is necessary.

Point is that developing such a system would be a very difficult process and much more is required than what you would initially expect. Just trying to give you some appreciation for this.