Rendering Data

Rendering is a process that converts waste animal tissue into stable, value-added materials. Rendering can refer to any processing of animal byproducts into more useful materials, or more narrowly to the rendering of whole animal fatty tissue into purified fats like lard or tallow. …

The majority of tissue processed comes from slaughterhouses, but also includes restaurant grease and butcher shop trimmings. This material can include the fatty tissue, bones, and offal, as well as entire carcasses of animals condemned at slaughterhouses, and those that have died on farms (deadstock), in transit, etc. …

Converting data from public sources and business archives into quality material, suitable for use in modeling, is often equally arduous if not quite so disgusting. Just substitute “back office systems” for “slaughterhouses,” “basement paper archives” for “fatty tissue, bones, and offal,” etc. Electronic systems create vast quantities of the stuff, but much of it could be aptly described as “deadstock.” The problem is that people can easily tolerate some dreck in material that they’re just browsing for information, so systems don’t go to great lengths to eliminate it. Models are pickier – a few extra zeroes somewhere can really affect calibration, for example.

There’s an interesting new weapon in the war against arcane formats, inconsistent field coding and other data flaws, google refine. It’s apparently a Freebase spinoff. I’ve only used it for one task so far: grouping ad hoc text in a column, recognizing that “Ventana Systems,” “Ventana Systems, Inc.,” and “Ventanna Systems” all mean the same thing. It has useful filtering and clustering tools that largely automate such painful manual tasks. It can also do something else I’ve often hungered for in the past: transform an indented list into a table format. I suspect there’s much more depth that I haven’t even seen. Best of all, it’s fairly easy to get started.

It won’t cure all sins of corporate data management, like throwing away everything more than a few years old, but I’ll definitely reach for it next time I have a platefull of messy data to clean up for a model. Check it out.

No animals were harmed in the writing of this post.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ 27 = 37

This site uses Akismet to reduce spam. Learn how your comment data is processed.