AI & Copyright

The US Copyright office has issued its latest opinion on AI and copyright:

https://natlawreview.com/article/copyright-offices-latest-guidance-ai-and-copyrightability

The U.S. Copyright Office’s January 2025 report on AI and copyrightability reaffirms the longstanding principle that copyright protection is reserved for works of human authorship. Outputs created entirely by generative artificial intelligence (AI), with no human creative input, are not eligible for copyright protection. The Office offers a framework for assessing human authorship for works involving AI, outlining three scenarios: (1) using AI as an assistive tool rather than a replacement for human creativity, (2) incorporating human-created elements into AI-generated output, and (3) creatively arranging or modifying AI-generated elements.

The office’s approach to use of models seems fairly reasonable to me.

I’m not so enthusiastic about the de facto policy for ingestion of copyrighted material for training models, which courts have ruled to be fair use.

https://www.arl.org/blog/training-generative-ai-models-on-copyrighted-works-is-fair-use/

On the question of whether ingesting copyrighted works to train LLMs is fair use, LCA points to the history of courts applying the US Copyright Act to AI. For instance, under the precedent established in Authors Guild v. HathiTrust and upheld in Authors Guild v. Google, the US Court of Appeals for the Second Circuit held that mass digitization of a large volume of in-copyright books in order to distill and reveal new information about the books was a fair use. While these cases did not concern generative AI, they did involve machine learning. The courts now hearing the pending challenges to ingestion for training generative AI models are perfectly capable of applying these precedents to the cases before them.

I get that there are benefits to inclusive data for LLMs,

Why are scholars and librarians so invested in protecting the precedent that training AI LLMs on copyright-protected works is a transformative fair use? Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi (of UC Berkeley Library) recently wrote that maintaining the continued treatment of training AI models as fair use is “essential to protecting research,” including non-generative, nonprofit educational research methodologies like text and data mining (TDM). …

What bothers me is that allegedly “generative” AI is only accidentally so. I think a better term in many cases might be “regurgitative.” An LLM is really just a big function with a zillion parameters, trained to minimize prediction error on sentence tokens. It may learn some underlying, even unobserved, patterns in the training corpus, but for any unique feature it may essentially be compressing information rather than transforming it in some way. That’s still useful – after all, there are only so many ways to write a python script to suck tab-delimited text into a dataframe – but it doesn’t seem like such a model deserves much IP protection.

Perhaps the solution is laissez faire – DeepSeek “steals” the corpus the AI corps “transformed” from everyone else, commencing a race to the bottom in which the key tech winds up being cheap and hard to monopolize. That doesn’t seem like a very satisfying policy outcome though.

AI for modeling – what (not) to do

Ali Akhavan and Mohammad Jalali have a nice new article in the SDR on the use of AI (LLMs) to complement simulation modeling.

Generative AI and simulation modeling: how should you (not) use large language models like ChatGPT

Ali Akhavan, Mohammad S. Jalali

Abstract

Generative Artificial Intelligence (AI) tools, such as Large Language Models (LLMs) and chatbots like ChatGPT, hold promise for advancing simulation modeling. Despite their growing prominence and associated debates, there remains a gap in comprehending the potential of generative AI in this field and a lack of guidelines for its effective deployment. This article endeavors to bridge these gaps. We discuss the applications of ChatGPT through an example of modeling COVID-19’s impact on economic growth in the United States. However, our guidelines are generic and can be applied to a broader range of generative AI tools. Our work presents a systematic approach for integrating generative AI across the simulation research continuum, from problem articulation to insight derivation and documentation, independent of the specific simulation modeling method. We emphasize while these tools offer enhancements in refining ideas and expediting processes, they should complement rather than replace critical thinking inherent to research.

It’s loaded with useful examples of prompts and responses:

I haven’t really digested this yet, but I’m looking forward to writing about it. In the meantime, I’m very interested to hear your take in the comments.

AI Chatbots on Causality

Having recently encountered some major causality train wrecks, I got curious about LLM “understanding” of causality. If AI chatbots are trained on the web corpus, and the web doesn’t “get” causality, there’s no reason to think that AI will make sense either.

TLDR; ChatGPT and Bing utterly fail this test, for reasons that are evident in Google Bard’s surprisingly smart answer.

ChatGPT: FAIL

Bing: FAIL

Google Bard: PASS

Google gets strong marks for mentioning a bunch of reasons to expect that we might not find a correlation, even though x is known to cause y. I’d probably only give it a B+, because it neglected integration and feedback, but it’s a good answer that properly raises lots of doubts about simplistic views of causality.

Assessing the predictability of nonlinear dynamics

An interesting exploration of the limits of data-driven predictions in nonlinear dynamic problems:

Assessing the predictability of nonlinear dynamics under smooth parameter changes
Simone Cenci, Lucas P. Medeiros, George Sugihara and Serguei Saavedra
https://doi.org/10.1098/rsif.2019.0627

Short-term forecasts of nonlinear dynamics are important for risk-assessment studies and to inform sustainable decision-making for physical, biological and financial problems, among others. Generally, the accuracy of short-term forecasts depends upon two main factors: the capacity of learning algorithms to generalize well on unseen data and the intrinsic predictability of the dynamics. While generalization skills of learning algorithms can be assessed with well-established methods, estimating the predictability of the underlying nonlinear generating process from empirical time series remains a big challenge. Here, we show that, in changing environments, the predictability of nonlinear dynamics can be associated with the time-varying stability of the system with respect to smooth changes in model parameters, i.e. its local structural stability. Using synthetic data, we demonstrate that forecasts from locally structurally unstable states in smoothly changing environments can produce significantly large prediction errors, and we provide a systematic methodology to identify these states from data. Finally, we illustrate the practical applicability of our results using an empirical dataset. Overall, this study provides a framework to associate an uncertainty level with short-term forecasts made in smoothly changing environments.

AI babble passes the Turing test

Here’s a nice example of how AI is killing us now. I won’t dignify this with a link, but I found it posted by a LinkedIn user.

I’d call this an example of artificial stupidity, not AI. The article starts off sounding plausible, but quickly degenerates into complete nonsense that’s either automatically generated or translated, with catastrophic results. But it was good enough to make it past someone’s cognitive filters.

For years, corporations have targeted on World Health Organization to indicate ads to and once to indicate the ads. AI permits marketers to, instead, specialize in what messages to indicate the audience, therefore, brands will produce powerful ads specific to the target market. With programmatic accounting for 67% of all international show ads in 2017, AI is required quite ever to make sure the inflated volume of ads doesn’t have an effect on the standard of ads.

One style of AI that’s showing important promise during this space is tongue process (NLP). informatics could be a psychological feature machine learning technology which will realize trends in behavior and traffic an equivalent method an individual’s brain will. mistreatment informatics during this method can match ads with people supported context, compared to only keywords within the past, thus considerably increasing click rates and conversions.