The overconfidence of nuclear engineers

Rumors that the Fort Calhoun nuclear power station is subject to a media blackout appear to be overblown, given that the NRC is blogging the situation.

Apparently floodwaters at the plant were at 1006 feet ASL yesterday, which is a fair margin from the 1014 foot design standard for the plant. That margin might have been a lot less, if the NRC hadn’t cited the plant for design violations last year, which it estimated would lead to certain core damage at 1010 feet.

Still, engineers say things like this:

“We have much more safety measures in place than we actually need right now,” Jones continued. “Even if the water level did rise to 1014 feet above mean sea level, the plant is designed to handle that much water and beyond. We have additional steps we can take if we need them, but we don’t think we will. We feel we’re in good shape.” – suite101

The “and beyond” sounds like pure embellishment. The design flood elevation for the plant is 1014 feet. I’ve read some NRC documents on the plant, and there’s no other indication that higher design standards were used. Presumably there are safety margins in systems, but those are designed to offset unanticipated failures, e.g. from design deviations like those discovered by the NRC. Surely the risk of unanticipated problems would rise dramatically above the maximum anticipated flood level of 1014 feet.
Overconfidence is a major contributor to accidents in complex systems. How about a little humility?
Currently the Missouri River forecast is pretty flat, so hopefully we won’t test the limits of the plant design.

The real constraint on nuclear power: war

A future where everything goes right for nuclear power, with advancing technology driving down costs, making reactors a safe and ubiquitous energy source, and providing a magic bullet for climate change, might bring other surprises.

For example, technology might also make supersonic cruise missiles cheap and ubiquitous.

Brahmos_imds

The Fukushima operators appear to be hanging in there. But imagine how they’d be coping if someone fired a missile at them once in a while.

Fortunately, reactors today are mostly in places where peace and rule of law prevail.

world_map

But peace and good governance aren’t exactly the norm in places where emissions are rising rapidly, or the poor need energy.

governance

Building lots of nuclear power plants is ultimately a commitment to peace, or at least acceptance of rather dreadful consequences of war (not necessarily war with nuclear weapons, but war with conventional weapons turning nuclear reactors into big dirty bombs).

One would hope that abundant, clean energy would reduce the motivation to blow things up, but how much are we willing to gamble on that?

Nuclear systems thinking roundup

Mengers & Sirelli call for systems thinking in the nuclear industry in IEEE Xplore:

Need for Change Towards Systems Thinking in the U.S. Nuclear Industry

Until recently, nuclear has been largely considered as an established power source with no need for new developments in its generation and the management of its power plants. However, this idea is rapidly changing due to reasons discussed in this study. Many U.S. nuclear power plants are receiving life extensions decades beyond their originally planned lives, which requires the consideration of new risks and uncertainties. This research first investigates those potential risks and sheds light on how nuclear utilities perceive and plan for these risks. After that, it examines the need for systems thinking for extended operation of nuclear reactors in the U.S. Finally, it concludes that U.S. nuclear power plants are good examples of systems in need of change from a traditional managerial view to a systems approach.

In this talk from the MIT SDM conference, NRC commissioner George Apostolakis is already there:

Systems Issues in Nuclear Reactor Safety

This presentation will address the important role system modeling has played in meeting the Nuclear Regulatory Commission’s expectation that the risks from nuclear power plants should not be a significant addition to other societal risks. Nuclear power plants are designed to be fundamentally safe due to diverse and redundant barriers to prevent radiation exposure to the public and the environment. A summary of the evolution of probabilistic risk assessment of commercial nuclear power systems will be presented. The summary will begin with the landmark Reactor Safety Study performed in 1975 and continue up to the risk-informed Reactor Oversight Process. Topics will include risk-informed decision making, risk assessment limitations, the philosophy of defense-in-depth, importance measures, regulatory approaches to handling procedural and human errors, and the influence of safety culture as the next level of nuclear power safety performance improvement.

The presentation is interesting, in that it’s about 20% engineering and 80% human factors. Figuring out how people interact with a really complicated control system is a big challenge.

This thesis looks like an example of what Apostolakis is talking about:

Perfect plant operation with high safety and economic performance is based on both good physical design and successful organization. However, in comparison with the affection that has been paid to technology research, the effort that has been exerted to enhance NPP management and organization, namely human performance, seems pale and insufficient. There is a need to identify and assess aspects of human performance that are predictive of plant safety and performance and to develop models and measures of these performance aspects that can be used for operation policy evaluation, problem diagnosis, and risk-informed regulation. The challenge of this research is that: an NPP is a system that is comprised of human and physics subsystems. Every human department includes different functional workers, supervisors, and managers; while every physical component can be in normal status, failure status, or a being-repaired status. Thus, an NPP’s situation can be expressed as a time-dependent function of the interactions among a large number of system elements. The interactions between these components are often non-linear and coupled, sometime there are direct or indirect, negative or positive feedbacks, and hence a small interference input either can be suppressed or can be amplified and may result in a severe accident finally. This research expanded ORSIM (Nuclear Power Plant Operations and Risk Simulator) model, which is a quantitative computer model built by system dynamics methodology, on human reliability aspect and used it to predict the dynamic behavior of NPP human performance, analyze the contribution of a single operation activity to the plant performance under different circumstances, diagnose and prevent fault triggers from the operational point of view, and identify good experience and policies in the operation of NPPs.

The cool thing about this, from my perspective, is that it’s a blend of plant control with classic SD maintenance project management. It looks at the plant as a bunch of backlogs to be managed, and defines instability as a circumstance in which the rate of creation of new work exceeds the capacity to perform tasks. This is made operational through explicit work and personnel stocks, right down to the matter of who’s in charge of the control room. Advisor Michael Golay has written previously about SD in the nuclear industry.

Others in the SD community have looked at some of the “outer loops” operating around the plant, using group model building. Not surprisingly, this yields multiple perspectives and some counterintuitive insights – for example:

Regulatory oversight was initially and logically believed by the group to be independent of the organization and its activities. It was therefore identified as a policy variable.

However in constructing the very first model at the workshop it became apparent that for the event and system under investigation the degree of oversight was influenced by the number of event reports (notifications to the regulator of abnormal occurrences or substandard conditions) the organization was producing. …

The top loop demonstrates the reinforcing effect of a good safety culture, as it encourages compliance, decreases the normalisation of unauthorised changes, therefore increasing vigilance for any outlining unauthorised deviations from approved actions and behaviours, strengthening the safety culture. Or if the opposite is the case an erosion of the safety culture results in unauthorised changes becoming accepted as the norm, this normalisation disguises the inherent danger in deviating from the approved process. Vigilance to these unauthorised deviations and the associated potential risks decreases, reinforcing the decline of the safety culture by reducing the means by which it is thought to increase. This is however balanced by the paradoxical notion set up by the feedback loop involving oversight. As safety improves, the number of reportable events, and therefore reported events can decrease. The paradoxical behaviour is induced if the regulator perceives this lack of event reports as an indication that the system is safe, and reduces the degree of oversight it provides.

Tsuchiya et al. reinforce the idea that change management can be part of the problem as well as part of the solution,

Markus Salge provides a nice retrospective on the Chernobyl accident, best summarized in pictures:

Salge Chernobyl

Key feedback structure of a graphite-moderated reactor like Chernobyl

Salge Flirting With Disaster

“Flirting with Disaster” dynamics

Others are looking at the nuclear fuel cycle and the role of nuclear power in energy systems.

How to be confused about nuclear safety

There’s been a long running debate about nuclear safety, which boils down to, what’s the probability of significant radiation exposure? That in turn has much to do with the probability of core meltdowns and other consequential events that could release radioactive material.

I asked my kids about an analogy to the problem: determining whether a die was fair. They concluded that it ought to be possible to simply roll the die enough times to observe whether the outcome was fair. Then I asked them how that would work for rare events – a thousand-sided die, for example. No one wanted to roll the dice that much, but they quickly hit on the alternative: use a computer. But then, they wondered, how do you know if the computer model is any good?

Those are basically the choices for nuclear safety estimation: observe real plants (slow, expensive), or use models of plants.

If you go the model route, you introduce an additional layer of uncertainty, because you have to validate the model, which in itself is difficult. It’s easy to misjudge reactor safety by doing five things:

  • Ignore the dynamics of the problem. For example, use a statistical model that doesn’t capture feedback. Presumably there have been a number of reinforcing feedbacks operating at the Fukushima site, causing spillovers from one system to another, or one plant to another:
    • Collateral damage (catastrophic failure of part A damages part B)
    • Contamination (radiation spewed from one reactor makes it unsafe to work on others)
    • Exhaustion of common resources (operators, boron)
  • Ignore the covariance matrix. This can arise in part from ignoring the dynamics above. But there are other possibilities as well: common design elements, or colocation of reactors, that render failure events non-independent.
  • Model an idealized design, not a real plant: ignore components that don’t perform to spec, nonlinearities in responses to extreme conditions, and operator error.
  • Draw a narrow boundary around the problem. Over the last week, many commentators have noted that reactor containment structures are very robust, and explicitly designed to prevent a major radiation release from a worst-case core meltdown. However, that ignores spent fuel stored outside of containment, which is apparently a big part of the Fukushima hazard now.
  • Ignore the passage of time. This can both help and hurt: newer reactor designs should benefit from learning about problems with older ones; newer designs might introduce new problems; life extension of old reactors introduces its own set of engineering issues (like neutron embrittlement of materials).
  • Ignore the unknown unknowns (easy to say, hard to avoid).

I haven’t read much of the safety literature, so I can’t say to what extent the above issues apply to existing risk analyses based on statistical models or detailed plant simulation codes. However, I do see a bit of a disconnect between actual performance and risk numbers that are often bandied about from such studies: the canonical risk of 1 meltdown per 10,000 reactor years, and other even smaller probabilities on the order of 1 per 100,000 or 1,000,000 reactor years.

I built myself a little model to assess the data, using WNA data to estimate reactor-years of operation and a wiki list of accidents. One could argue at length which accidents should be included. Only light water reactors? Only modern designs? I tend to favor a liberal policy for including accidents. As soon as you start coming up with excuses to exclude things, you’re headed toward an idealized world view, where operators are always faithful, plants are always shiny and new, or at least retired on schedule, etc. Still, I was a bit conservative: I counted 7 partial or total meltdown accidents in commercial or at least quasi-commercial reactors, including Santa Susana, Fermi, TMI, Chernobyl, and Fukushima (I think I missed Chapelcross). Then I looked at maximum likelihood estimates of meltdown frequency over various intervals. Using all the data, assuming Poisson arrivals of meltdowns, you get .6 failures per thousand reactor-years (95% confidence interval .3 to 1). That’s up from .4 [.1,.8] before Fukushima. Even if you exclude the early incidents and Fukushima, you’re looking at .2 [.04,.6] meltdowns per thousand reactor years – twice the 1-per-10,000 target. For the different subsets of the data, the estimates translate to an expected meltdown frequency of about once to thrice per decade, assuming continuing operations of about 450 reactors. That seems pretty bad.

In other words, the actual experience of rolling the dice seems to be yielding a riskier outcome than risk models suggest. One could argue that most of the failing reactors were old, built long ago, or poorly designed. Maybe so, but will we ever have a fleet of young rectors, designed and operated by demigods? That’s not likely, but surely things will get somewhat better with the march of technology. So, the question is, how much better? Areva’s 10x improvement seems inadequate if it’s measured against the performance of existing plants, at least if we plan to grow the plant fleet by much more than a factor of 10 to replace fossil fuels. There are newer designs around, but they depart from the evolutionary path of light water reactors, which means that “past performance is no indication of future returns” applies – will greater passive safety outweigh the effects of jumping to a new, less mature safety learning curve?

It seems to me that we need models of plant safety that square with the actual operational history of plants, to reconcile projected risk with real-world risk experience. If engineers promote analysis that appears unjustifiably optimistic, the public will do what it always does: discount the results of formal models, in favor of mental models that may be informed by superstition and visions of mushroom clouds.

Nuclear safety follies

I find panic-fueled iodine marketing and disingenuous comparisons of Fukushima to Chernobyl deplorable.

iodineBut those are balanced by pronouncements like this:

Telephone briefing from Sir John Beddington, the UK’s chief scientific adviser, and Hilary Walker, deputy director for emergency preparedness at the Department of Health.“Unequivocally, Tokyo will not be affected by the radiation fallout of explosions that have occurred or may occur at the Fukushima nuclear power stations.”

Surely the prospect of large scale radiation release is very low, but it’s not approximately zero, which is my interpretation of “unequivocally not.”

On my list of the seven deadly sins of complex systems management, number four is,

Certainty. Planning for it leads to fragile strategies. If you can’t imagine a way you could be wrong, you’re probably a fanatic.

Nuclear engineers disagree, but some seem to have a near-fanatic faith in plant safety. Normal Accidents documents some bizarrely cheerful post-accident reflections on safety. I found another when reading up over the last few days:

again Continue reading “Nuclear safety follies”

Will complex designs win the nuclear race?

Areva pursues “defense in depth” for reactor safety:

Areva SA (CEI) Chief Executive Officer Anne Lauvergeon said explosions at a Japanese atomic power site in the wake of an earthquake last week underscore her strategy to offer more complex reactors that promise superior safety.

“Low-cost reactors aren’t the future,” Lauvergeon said on France 2 television station yesterday. “There was a big controversy for one year in France about the fact that our reactors were too safe.”

Lauvergeon has been under pressure to hold onto her job amid delays at a nuclear plant under construction in Finland. The company and French utility Electricite de France SA, both controlled by the state, lost a contract in 2009 worth about $20 billion to build four nuclear stations in the United Arab Emirates, prompting EDF CEO Henri Proglio to publicly question the merits of Areva’s more complex and expensive reactor design.

Areva’s new EPR reactors, being built in France, Finland and China, boasts four independent safety sub-systems that are supposed to reduce core accidents by a factor 10 compared with previous reactors, according to the company.

The design has a double concrete shell to withstand missiles or a commercial plane crash, systems designed to prevent hydrogen accumulation that may cause radioactive release, and a core catcher in the containment building in the case of a meltdown. To withstand severe earthquakes, the entire nuclear island stands on a single six-meter (19.6 feet) thick reinforced concrete base, according to Paris-based Areva.

via Bloomberg

I don’t doubt that the Areva design is far better than the reactors now in trouble in Japan. But I wonder if this is really the way forward. Big, expensive hardware that uses multiple redundant safety systems to offset the fundamentally marginal stability of the reaction might indeed work safely, but it doesn’t seem very deployable on the kind of scale needed for either GHG emissions mitigation or humanitarian electrification of the developing world. The financing comes in overly large bites, huge piles of concretes increase energy and emission payback periods, and it would take ages to ramp up construction and training enough to make a dent in the global challenge.

I suspect that the future – if there is one – lies with simpler designs that come in smaller portions and trade some performance for inherent stability and antiproliferation features. I can’t say whether their technology can actually deliver on the promises, but at least TerraPower – for example – has the right attitude:

“A cheaper reactor design that can burn waste and doesn’t run into fuel limitations would be a big thing,” Mr. Gates says.

However, even simple/small-is-beautiful may come rather late in the game from a climate standpoint:

While Intellectual Ventures has caught the attention of academics, the commercial industry–hoping to stimulate interest in an energy source that doesn’t contribute to global warming–is focused on selling its first reactors in the U.S. in 30 years. The designs it’s proposing, however, are essentially updates on the models operating today. Intellectual Ventures thinks that the traveling-wave design will have more appeal a bit further down the road, when a nuclear renaissance is fully under way and fuel supplies look tight. Technology Review

Not surprisingly, the evolution of the TerraPower design relies on models,

Myhrvold: When you put a software guy on an energy project he turns it into a software project. One of the reasons were innovating around nuclear is that we put a huge amount of energy into computer modeling. We do very extensive computer modeling and have better computer modeling of reactor internals than anyone in the world. No one can touch us on software for designing the reactor. Nuclear is really expensive to do experiments on, so when you have good software it’s way more efficient and a shorter design cycle.

Computing is something that is very important for nuclear. The first fast reactors, which TerraPower is, were basically designed in the slide rule era. It was stunning to us that the guys back then did what they did. We have these incredibly accurate simulations of isotopes and these guys were all doing it with slide rules. My cell phone has more computing power than the computers that were used to design the world’s nuclear plants.

It’ll be interesting to see whether current events kindle interest in new designs, or throw the baby out with the bathwater (is it a regular baby, or a baby Godzilla?). From a policy standpoint, the trick is to create a level playing field for competition among nuclear and non-nuclear technologies, where government participation in the fuel cycle has been overwhelming and risks are thoroughly socialized.

Fortunately, the core ended up on the floor

I’ve been sniffing around for more information on the dynamics of boiling water reactors, particularly in extreme conditions. Here’s what I can glean (caveat: I’m not a nuclear engineer).

It turns out that there’s quite a bit of literature on reduced-form models of reactor operations. Most of this, though, is focused on operational issues that arise from nonlinear dynamics, on a time scale of less than a second or so. (Update: I’ve posted an example of such a model here.)

reactorBlockDiagram

Source: Instability in BWR NPPs – F. Maggini 2004

Those are important – it was exactly those kinds of fast dynamics that led to disaster when operators took the Chernobyl plant into unsafe territory. (Fortunately, the Chernobyl design is not widespread.)

However, I don’t think those are the issues that are now of interest. The Japanese reactors are now far from their normal operating point, and the dynamics of interest have time scales of hours, not seconds. Here’s a map of the territory:

reactorShutdown2

Source: Instability in BWR NPPs – F. Maggini 2004
colored annotations by me.

The horizontal axis is coolant flow through the core, and the vertical axis is core power – i.e. the rate of heat generation. The green dot shows normal full-power operation. The upper left part of the diagram, above the diagonal, is the danger zone, where high power output and low coolant flow creates the danger of a meltdown – like driving your car over a mountain pass, with nothing in the radiator.

It’s important to realize that there are constraints on how you move around this diagram. You can quickly turn off the nuclear chain reaction in a reactor, by inserting the control rods, but it takes a while for the power output to come down, because there’s a lot of residual heat from nuclear decay products.

On the other hand, you can turn off the coolant flow pretty fast – turn off the electricity to the pumps, and the flow will stop as soon as the momentum of the fluid is dissipated. If you were crazy enough to turn off the cooling without turning down the power (yellow line), you’d have an immediate catastrophe on your hands.

In an orderly shutdown, you turn off the chain reaction, then wait patiently for the power to come down, while maintaining coolant flow. That’s initially what happened at the Fukushima reactors (blue line). Seismic sensors shut down the reactors, and an orderly cool-down process began.

After an hour, things went wrong when the tsunami swamped backup generators. Then the reactor followed the orange line to a state with near-zero coolant flow (whatever convection provides) and nontrivial power output from the decay products. At that point, things start heating up. The process takes a while, because there’s a lot of thermal mass in the reactor, so if cooling is quickly restored, no harm done.

If cooling isn’t restored, a number of positive feedbacks (nasty vicious cycles) can set in. Boiling in the reactor vessel necessitates venting (releasing small amounts of mostly short-lived radioactive materials); if venting fails, the reactor vessel can fail from overpressure. Boiling reduces the water level in the reactor and makes heat transfer less efficient; fuel rods that boil dry heat up much faster. As fuel rods overheat, their zirconium cladding reacts with water to make hydrogen – which can explode when vented into the reactor building, as we apparently saw at reactors 1 & 3. That can cause collateral damage to systems or people, making it harder to restore cooling.

Things get worse as heat continues to accumulate. Melting fuel rods dump debris in the reactor, obstructing coolant flow, again making it harder to restore cooling. Ultimately, melted fuel could concentrate in the bottom of the reactor vessel, away from the control rods, making power output go back up (following the red line). At that point, it’s likely that the fuel is going to end up in a puddle on the floor of the containment building. Presumably, at that point negative feedback reasserts dominance, as fuel is dispersed over a large area, and can cool passively. I haven’t seen any convincing descriptions of this endgame, but nuclear engineers seem to think it benign – at least compared to Chernobyl. At Chernobyl, there was one less balancing feedback loop (ineffective containment) and an additional reinforcing feedback: graphite in the reactor, which caught fire.

So, the ultimate story here is a race against time. The bad news is that if the core is dry and melting, time is not on your side as you progress faster and faster up the red line. The good news is that, as long as that hasn’t happened yet, time is on the side of the operators – the longer they can hold things together with duct tape and seawater, the less decay heat they have to contend with. Unfortunately, it sounds like we’re not out of the woods yet.

Boiling Water Reactor Dynamics

Replicated from “Hybrid Simulation of Boiling Water Reactor Dynamics Using A University Research Reactor” by James A. Turso, Robert M. Edwards, Jose March-Leuba, Nuclear Technology vol. 110, Apr. 1995.

This is a simple 5th-order representation of the operation of a boiling water reactor around its normal operating point, which is subject to interesting limit cycle dynamics.

The original article documents the model well, with the exception of the bifurcation parameter K and a nonlinear term, for which I’ve identified plausible values by experiment.

TursoNuke1.mdl

Nuclear accident dynamics

There’s been a lot of wild speculation about the nuclear situation in Japan. Reporters were quick to start a “countdown to meltdown” based on only the sketchiest information about problems at plants, and then were quick to wonder if our troubles were over because the destruction of the containment structure at Fukushima I-1 didn’t breach the reactor vessel, based on equally sketchy information. Now the cycle repeats for reactor 3. Here’s my take on the fundamentals of the situation.

Boiling water reactors (BWRs), like those at Fukushima, are not inherently stable in all states. For a system analogy, think of a pendulum. It’s stable when it’s hanging, as in a grandfather clock. If you disturb it, it will oscillate for a while, but eventually return to hanging quietly. On the other hand, an inverted pendulum, where the arm stands above the pivot, like a broom balanced on your palm, is unstable – a small disturbance that starts it tipping is reinforced by gravity, and it quickly falls over.

Still, it is possible to balance a broom on your palm for a long time, if you’re diligent about it. The system of an inverted broomstick plus a careful person controlling it is stable, at least over a reasonable range of disturbances. Similarly, a BWR is at times dependent on a functional control system to maintain stability. Damage the control system (or tickle the broom-balancer), and the system may spiral out of control.

An inverted broom is, of course, an imperfect analogy for a nuclear power plant. A broom can be described by just a few variables – its angular and translational position and momentum. Those are all readily observable within a tenth of a second or so. A BWR, on the other hand, has hundreds of relevant state variables – pressure and temperature at various points, the open or closed states of valves, etc. Presumably some  have a lot of inertial – implying long delays in changing them. Many states are not directly observable – they have to be inferred from measurements at other points in the system. Unfortunately, those measurements are sometimes unreliable, leaving operators wondering whether the water in area A is rising because valve B failed to close, or if it’s just a faulty sensor.

No one can manage a 10th or 100th order differential equation with uncertain measurements in their head – yet that is essentially the task facing the Fukushima operators now. Their epic challenge is compounded by a number of reinforcing feedbacks.

  • First, there’s collateral damage, which creates a vicious cycle: part A breaks down, causing part B to overheat, causing part C to blow up, which ignites adjacent (but unrelated) part D, and so on. The destruction of the containment building around reactor 1 has to be the ultimate example of this. It’s hard to imagine that much of the control system remains functional after such a violent event – and that makes escalation of problems all the more likely.
  • Second, there are people in the loop. Managing a BWR in routine conditions is essentially boring. Long periods of boredom, punctuated by brief periods of panic, do not create conditions for good management decisions. Mistakes cause irreversible damage, worsening the circumstances under which further decisions must be made – another vicious cycle.
  • Third, there’s contamination. If things get bad enough, you can’t even safely approach the system to measure or fix it.

It appears that the main fallback for the out-of-control reactors is to exploit the most basic balancing feedback loop: pump a lot of water in to carry off heat, while you figure out what to do next. I hope it works.

Meanwhile, on the outside, some observers seem inexplicably optimistic – they cheerfully conclude that, because the reactor vessel itself remains intact (hopefully), the system works due to its redundant safety measures. Commentators on past accidents have said much the same thing. The problem was that, when the dust settled, the situation often proved much worse than thought at the time, and safety systems sometimes contributed as much to problems as they solved – not a huge surprise in a very complex system.

We seem to be learning the wrong lessons from such events:

The presidential commission investigating the Three Mile Island accident learned that the problems rested with people, not technology. http://www.technologyreview.com/article/23907/

This strikes me as absurd. No technology exists in a vacuum; they must be appropriate to people. A technology that requires perfect controllers for safe operation is a problem, because there’s no such thing.

If there’s a future for nuclear, I think it’ll have to lie with designs that incorporate many more passive safety features – the reactor system, absent control inputs, has to look a lot more like a hanging pendulum than a balanced broom, so that when the unlikely happens, it reacts benignly.