Models and copyrights

Or, Friends don’t let friends work for hire.

opencontent

Image Copyright 2004 Lawrence Liang, Piet Zwart Institute, licensed under a Creative Commons License

Photographers and other media workers hate work for hire, because it’s often a bad economic tradeoff, giving up future income potential for work that’s underpaid in the first place. But at least when you give up rights to a photo, that’s the end of it. You can take future photos without worrying about past ones.

For models and software, that’s not the case, and therefore work for hire makes modelers a danger to themselves and to future clients. The problem is that models draw on a constrained space of possible formulations of a concept, and tend to incorporate a lot of prior art. Most of the author’s prior art is probably, in turn, things learned from other modelers. But when a modeler reuses a bit of structure – say, a particular representation of a supply chain or a consumer choice decision – under a work for hire agreement, title to those equations becomes clouded, because the work-for-hire client owns the new work, and it’s hard to distinguish new from old.

The next time you reuse components that have been used for work-for-hire, the previous client can sue for infringement, threatening both you and future clients. It doesn’t matter if the claim is legitimate; the lawsuit could be debilitating, even if you could ultimately win. Clients are often much bigger, with deeper legal pockets, than freelance modelers. You also can’t rely on a friendly working relationship, because bad things can happen in spite of good intentions: a hostile party might acquire copyright through a bankruptcy, for example.

The only viable approach, in the long run, is to retain copyright to your own stuff, and grant clients all the license they need to use, reproduce, produce derivatives, or whatever. You can relicense a snippet of code as often as you want, so no client is ever threatened by another client’s rights or your past agreements.

Things are a little tougher when you want to collaborate with multiple parties. One apparent option, joint ownership of copyright to the model, is conceptually nice but actually not such a hot idea. First, there’s legal doctrine to the effect that individual owners have a responsibility not to devalue joint property, which is a problem if one owner subsequently wants to license or give away the model. Second, in some countries, joint owners have special responsibilities, so it’s hard to write a joint ownership contract that works worldwide.

Again, a viable approach is cross-licensing, where creators retain ownership of their own contributions, and license contributions to their partners. That’s essentially the approach we’ve taken within the C-ROADS team.

One thing to avoid at all costs is agreements that require equation-level tracking of ownership. It’s fairly easy to identify individual contributions to software code, because people tend to work in containers, contributing classes, functions or libraries that are naturally modular. Models, by contrast, tend to be fairly flat and tightly interconnected, so contributions can be widely scattered and difficult to attribute.

Part of the reason this is such a big problem is that we now have too much copyright protection, and it lasts way too long. That makes it hard for copyright agreements to recognize where we see far because we stand on the shoulders of giants, and distorts the balance of incentives intended by the framers of the constitution.

In the academic world, model copyright issues have historically been ignored for the most part. That’s good, because copyright is a hindrance to progress (as long as there are other incentives to create knowledge). That’s also bad, because it means that there are a lot of models out there that have not been placed in the public domain, but which are treated as if they were. If people start asserting their copyrights to those, things could get messy in the future.

A solution to all of this could be open source or free software. Copyleft licenses like the GPL and permissive licenses like Apache facilitate collaboration and reuse of models. That would enable the field to move faster as a whole through open extension of prior work. C-ROADS and C-LEARN and component models are going out under an open license, and I hope to do more such experiments in the future.

Update: I’ve posted some examples.

Workshop on Modularity and Integration of Climate Models

The MIT Center for Collective Intelligence is organizing a workshop at this year’s Conference on Computational Sustainability entitled “Modularity and Integration of Climate Models.” Check out the  Agenda.

Traditionally, computational models designed to simulate climate change and its associated impacts (climate science models, integrated assessment models, and climate economics models) have been developed as standalone entities. This limits possibilities for collaboration between independent researchers focused on sub-­?problems, and is a barrier to more rapid advances in climate modeling science because work is not distributed effectively across the community. The architecture of these models also precludes running a model with modular sub -­? components located on different physical hardware across a network.

In this workshop, we hope to examine the possibility for widespread development of climate model components that may be developed independently and coupled together at runtime in a “plug and play” fashion. Work on climate models and modeling frameworks that are more modular has begun, (e.g. Kim, et al., 2006) and substantial progress has been made in creating open data standards for climate science models, but many challenges remain.

A goal of this workshop is to characterize issues like these more precisely, and to brainstorm about approaches to addressing them. Another desirable outcome of this workshop is the creation of an informal working group that is interested in promoting more modular climate model development.

The model that ate Europe

arXiv covers modeling on an epic scale in Europe’s Plan to Simulate the Entire Earth: a billion dollar plan to build a huge infrastructure for global multiagent models. The core is a massive exaflop “Living Earth Simulator” – essentially the socioeconomic version of the Earth Simulator.

FuturIcT

I admire the audacity of this proposal, and there are many good ideas captured in one place:

  • The goal is to take on emergent phenomena like financial crises (getting away from the paradigm of incremental optimization of stable systems).
  • It embraces uncertainty and robustness through scenario analysis and Monte Carlo simulation.
  • It mixes modeling with data mining and visualization.
  • The general emphasis is on networks and multiagent simulations.

I have no doubt that there might be many interesting spinoffs from such a project. However, I suspect that the core goal of creating a realistic global model will be an epic failure, for three reasons. Continue reading “The model that ate Europe”

Computer models running the EU? Eruptions, models, and clueless reporting

The EU airspace shutdown provides yet another example of ignorance of the role of models in policy:

Computer Models Ruining EU?

Flawed computer models may have exaggerated the effects of an Icelandic volcano eruption that has grounded tens of thousands of flights, stranded hundreds of thousands of passengers and cost businesses hundreds of millions of euros. The computer models that guided decisions to impose a no-fly zone across most of Europe in recent days are based on incomplete science and limited data, according to European officials. As a result, they may have over-stated the risks to the public, needlessly grounding flights and damaging businesses. “It is a black box in certain areas,” Matthias Ruete, the EU’s director-general for mobility and transport, said on Monday, noting that many of the assumptions in the computer models were not backed by scientific evidence. European authorities were not sure about scientific questions, such as what concentration of ash was hazardous for jet engines, or at what rate ash fell from the sky, Mr. Ruete said. “It’s one of the elements where, as far as I know, we’re not quite clear about it,” he admitted. He also noted that early results of the 40-odd test flights conducted over the weekend by European airlines, such as KLM and Air France, suggested that the risk was less than the computer models had indicated. – Financial Times

Other venues picked up similar stories:

Also under scrutiny last night was the role played by an eight-man team at the Volcanic Ash Advisory Centre at Britain’s Meteorological Office. The European Commission said the unit started the chain of events that led to the unprecedented airspace shutdown based on a computer model rather than actual scientific data. – National Post

These reports miss a number of crucial points:

  • The decision to shut down the airspace was political, not scientific. Surely the Met Office team had input, but not the final word, and model results were only one input to the decision.
  • The distinction between computer models and “actual scientific data” is false. All measurements involve some kind of implicit model, required to interpret the result. The 40 test flights are meaningless without some statistical interpretation of sample size and so forth.
  • It’s not uncommon for models to demonstrate that data are wrong or misinterpreted.
  • The fact that every relationship or parameter in a model can’t be backed up with a particular measurement does not mean that the model is unscientific.
    • Numerical measurements are not the only valid source of data; there are also laws of physics, and a subject matter expert’s guess is likely to be better than a politician’s.
    • Calibration of the aggregate result of a model provides indirect measurement of uncertain components.
    • Feedback structure may render some parameters insensitive and therefore unimportant.
  • Good decisions sometimes lead to bad outcomes.

The reporters, and maybe also the director-general (covering his you-know-what), have neatly shifted blame, turning a problem in decision making under uncertainty into an anti-science witch hunt. What alternative to models do they suggest? Intuition? Prayer? Models are just a way of integrating knowledge in a formal, testable, shareable way. Sure, there are bad models, but unlike other bad ideas, it’s at least easy to identify their problems.

Thanks to Jack Dirmann, Green Technology for the tip.

Fit to data, good or evil?

The following is another extended excerpt from Jim Thompson and Jim Hines’ work on financial guarantee programs. The motivation was a client request for comparison of modeling results to data. The report pushes back a little, explaining some important limitations of model-data comparisons (though it ultimately also fulfills the request). I have a slightly different perspective, which I’ll try to indicate with some comments, but on the whole I find this to be an insightful and provocative essay.

First and Foremost, we do not want to give credence to the erroneous belief that good models match historical time series and bad models don’t. Second, we do not want to over-emphasize the importance of modeling to the process which we have undertaken, nor to imply that modeling is an end-product.

In this report we indicate why a good match between simulated and historical time series is not always important or interesting and how it can be misleading Note we are talking about comparing model output and historical time series. We do not address the separate issue of the use of data in creating computer model. In fact, we made heavy use of data in constructing our model and interpreting the output — including first hand experience, interviews, written descriptions, and time series.

This is a key point. Models that don’t report fit to data are often accused of not using any. In fact, fit to numerical data is only one of a number of tests of model quality that can be performed. Alone, it’s rather weak. In a consulting engagement, I once ran across a marketing science model that yielded a spectacular fit of sales volume against data, given advertising, price, holidays, and other inputs – R^2 of .95 or so. It turns out that the model was a linear regression, with a “seasonality” parameter for every week. Because there were only 3 years of data, those 52 parameters were largely responsible for the good fit (R^2 fell to < .7 if they were omitted). The underlying model was a linear regression that failed all kinds of reality checks.

Continue reading “Fit to data, good or evil?”

More climate models you can run

Following up on my earlier post, a few more on the menu:

SiMCaP – A simple tool for exploring emissions pathways, climate sensitivity, etc.

PRIMAP 2C Check Tool – A dirt-simple spreadsheet, exploiting the fact that cumulative emissions are a pretty good predictor of temperature outcomes along plausible emissions trajectories.

EdGCM – A full 3D model, for those who feel the need to get physical.

Last but not least, C-LEARN runs on the web. Desktop C-ROADS software is in the development pipeline.

Next Generation Climate Policy Models

Today I’m presenting a talk at an ECF workshop, Towards the next generation of climate policy models. The workshop’s in Berlin, but I’m staying in Montana, so my carbon footprint is minimal for this one (just wait until next month …). My slides are here: Towards Next Generation Climate Policy Models.

I created a set of links to supporting materials on del.icio.us.

Update Workshop materials are now on a web site here.

Endogenous Energy Technology

I just created an annotated list of links on learning/experience curves, deliberate R&D, and other forms of endogenous energy technology, including a few models and empirical estimates. See del.icio.us/tomfid for details. Comments with more references will be greatly appreciated!

Ethics, Equity & Models

I’m at the 2008 Balaton Group meeting, where a unique confluence of modeling talent, philosophy, history, activist know-how, compassion and thirst for sustainability makes it hard to go 5 minutes without having a Big Idea.

Our premeeting tackled Ethics, Values, and the Next Generation of Energy and Climate Modeling. I presented a primer on discounting and welfare in integrated assessment modeling, based on a document I wrote for last year’s meeting, translating some of the issues raised by the Stern Review and critiques into plainer language. Along the way, I kept a running list of assumptions in models and modeling processes that have ethical/equity implications.

There are three broad insights:

  1. Technical choices in models have ethical implications. For example, choices about the representation of technology and resource constraints determine whether a model explores a parameter space where “growing to help the poor” is a good idea or not.
  2. Modelers’ prescriptive and descriptive uses of discounting and other explicit choices with ethical implications are often not clearly distinguished.
  3. Decision makers have no clue how the items above influence model outcomes, and do not in any case operate at that level of description.

My list of ethical issues is long and somewhat overlapping. Perhaps in part that is due to the fact that I compiled it with no clear definition of ‘ethics’ in mind. However, I think it’s also due to the fact that there are inevitably large gray areas in practice, accentuated by the fact that the issue doesn’t receive much formal attention. Here goes: Continue reading “Ethics, Equity & Models”