The Fix Is In: Hubble’s Troubles Appear Over For Now | Hackaday

2022-05-21 14:45:25 By : Ms. Cassie He

Good news this morning from low Earth orbit, where the Hubble Space Telescope is back online after a long and worrisome month of inactivity following a glitch with the observatory’s payload computer.

We recently covered the Hubble payload computer in some depth; at the time, NASA was still very much in the diagnosis phase of the recovery, and had yet to determine a root cause. But the investigation was pointing to one of two possible culprits: the Command Unit/Science Data Formatter (CU/SDF), the module that interfaces the various science instruments, or the Power Control Unit (PCU), which provides regulated power for everything in the payload computer, more verbosely known as the SI C&DH, or Scientific Instrument Command and Data Handling Unit.

In the two weeks since that report, NASA made slow but steady progress, methodically testing every aspect of the SI C&DH. It wasn’t until just two days ago, on July 14, that NASA made a solid determination on root cause: the Power Control Unit, or more specifically, the power supply protection circuit on the PCU’s 5-volt rail. The circuit is designed to monitor the rail for undervoltage or overvoltage conditions, and to order the SI C&DH to shut down if the voltage is out of spec. It’s not entirely clear whether the PCU is actually putting out something other than 5 volts, or if the protection circuit has perhaps degraded since the entire SI C&DH was replaced in the last service mission in 2009. But either way, the fix is the same: switch to the backup PCU, a step that was carefully planned out and executed on July 15th.

To their credit, the agency took pains that everyone involved would be free from any sense of pressure to rush a fix — the 30-year-old spacecraft was stable, its instruments were all safely shut down, and so the imperative was to fix the problem without causing any collateral damage, or taking a step that couldn’t be undone. And further kudos go to NASA for transparency — the web page detailing their efforts to save Hubble reads almost like a build log on one of our projects.

There’s still quite a bit of work to be done to get Hubble back into business — the science instruments have to be woken up and checked out, for instance — but if all goes well, we should see science data start flowing back from the space telescope soon. It’s a relief that NASA was able to pull this fix off, but the fact that Hubble is down to its last backup is a reminder Hubble’s days are numbered, and that the best way to honor the feats of engineering derring-do that saved Hubble this time and many times before is to keep doing great science for as long as possible.

i used to find it a little nerve wracking to try to fix the server because if i screwed it up, i had to get on a subway to the colo downtown…2 hours out of my day, at least, just because i hit enter before thinking things through. these days, i’m not in such a position of such authority. if i screw up my home server i just have to go down to the basement and hope the 20 year old CRT down there still works so i can diagnose the problem. trying not to break the thing is kind of like a game of “floor is hot lava”, where failure has no real consequences but you play nonetheless. still feel like a moron when i kill the network interface before i thought of how i was gonna tell it to come back up!

fun to imagine being in these guy’s shoes!

Also, lots of different computers, most of which have redundant modules, and which can poke into each others’ memory without needing to have the other computer actually be running. The NSSC-1 which caused the problem has four memory modules (all non-volatile core memory of 4kw each — the NSSC-1 is a 18-bit beast) which can be switched between on demand by one of the _other_ computers. So you can upload a new firmware image, write it to memory, shut the NSSC-1 down remotely, swap modules, start it up again, all in flight and not requiring the NSSC-1 to be actually operational.

I got tired of the risk, and set up a pair of servers, with a spare Ethernet port from each server connected to the BMC of the other server. Just for fun, I left the OS reinstall disk in each of their drives. It would be really hard to break badly enough to lock myself out, now.

Good luck after messing up your switch/router configuration

Well, direct connections between the servers, and each server has a line from the data center, so I would have to either crater both machines at the same time, or mess with the BMC IP settings. Not impossible, but avoidable. I got real tired of driving 150 miles to troubleshoot, so tried to make it hard-to-break.

We have a network at our vacation home, so it’s a 4 hour road trip to fix something if I fat-finger it.

Makes router firmware upgrades just that little bit more exciting.

The Telescope That Could-way to go NASA!!

Out of curiosity, how deep are the number of back ups on the various pieces of key equipment? Has that strategy changed from 30 years ago till now?

Yeah, I was surprised at the mention of *triple* redundancy of some components (the memory modules, IIRC)

NASA made a big deal about *no* redundancy (“single string redundancy”) in the recent Mars rovers Curiosity & Perseverance. The idea being: You remove the complexity and weight of redundant components and handover/fallback methods, and instead focus on fault *tolerance*, and allocate the weight and engineering and testing resources to designing the system to fail better and less often in the first place.

Akin’s Law of Spacecraft design #2: “To design a spacecraft right takes an infinite amount of effort. This is why it’s a good idea to design them to operate when some things are wrong.”

My favorite space story is the satellite that had a damaged battery system and went dark. Later, something went just wrong enough to bridge the connection to its solar panels, so it started transmitting again but only when the sun is on its panels.

I’m not surprised at the tripple redundancy. It’s just logical, common sense. As a rule of thumb, the majority is “right”. Ie, it’s unlikely that 2 of 3 devices fail. Normally, it’s merely one of them. This makes 3 the minimum number needed for redundancy.

Alas, exceptions confirm the rule. There was this story about an underground tram, in which that said redundancy failed. Two of three circuits failed and the doors opened in the tunnels/closed at the stations. πŸ˜‚

Sounds like a case where the work to build in redundancy should have been traded away to make the system more robust and fault-tolerant in the first place.

Hubble has used backup hardware before, but during the servicing missions they would replace the damaged part and restore primary systems. With no planned servicing missions, the backup is all they have. Once that goes, it’s the end of the mission unless a commercial operator can come up with a practical servicing mission. As far as orbital work platforms go, nothing flying right now can hold a candle to the Shuttle, but that doesn’t mean it’s not possible.

Personally my money would be on SpaceX using a modified Crew Dragon, but a few years ago Sierra Nevada said they could do it with Dream Chaser: https://www.spaceflightinsider.com/missions/space-observatories/trump-space-advisors-considering-hubble-servicing-mission/

If it totally dies, it becomes junk? If that happens, Musk and Bezos should race to replace the faulty PSU (and update the other electronics) and claim it for themselves.

They could take the world’s most expensive selfie together! I assume that’s like the billionaires version of saying GG

This is great news to hear (we could all use some of that these days). Wonder if a robotic servicing mission would work. Even bring it down to a lower orbit for a manned servicing mission. It is a national treasure that should be kept going.

I am sure the techs were waving their magic wands chanting “Hubble Hubble toil and trouble…”

If anyone’s interested in this kind of computer, I did an epic live-coding video where I write an assembler and emulator for the OBP (which begat the AOP, which begat the NSSC-1 which the Hubble uses). The blog post and writeup is here: http://cowlark.com/2021-07-03-obp-simulator

I don’t have specs for the NSSC-1, but it’s apparently very similar to the OBP, which has an 18-word data bus, a 16-word address bus with up to 64kw of addressable memory, all implemented with core memory and NOR gates, running at ~250kHz. The instruction set is surprisingly modern for something which was developed in 1968 (this was then streamlined a bit for the AOP); it would work fine in a modern embedded microcontroller. And the weirdest feature is… the OBP assembler uses _natural language_.

Find a tech named Mike Nelson to do it… “Mike fixed the Hubble, Mike fixed the Hubble!”

How does he eat & breathe though?

its amazing what kind of fixes can be phoned in.

Why isn’t the switch over automatic? Waiting for command is just an extra step that could fail.

What if the automatic switchover failed and then it got stuck in a loop trying to switch over? Less manual control = less repairability.

Good stuff to see. Trend on succesfull missions can not but rejoice. I am sure the techs have done pretty good job.

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.

By using our website and services, you expressly agree to the placement of our performance, functionality and advertising cookies. Learn more