Response to Review of “Notes” by Peter McCluskey


An acquaintance of mine, Peter McCluskey, was nice enough to read my book Notes on a New Philosophy of Empirical Science and write a review of it on his blog Bayesian Investor. The review is basically positive. Several of the critical comments in the review indicate real shortcomings of the book, which I hope to correct in the final version.

McCluskey understands the big-picture concepts presented by the book. This is a good start, because it is often hard for me to convey these concepts to people who are not well versed in information theory and statistics, even though the book attempts to address a general audience. For example, he seems to understand without much effort that there is a near equivalence between the step “make a concrete prediction” in the traditional scientific method, and the step “build a data compressor” in the CRM. That is a big leap for most people.

I do want to clarify some points about the goal of the book. McCluskey summarizes:

Machine Learning (ML) is potentially science, and this book focuses on how ML will be improved by viewing its problems through the lens of CRM. Burfoot complains about the toolkit mentality of traditional ML research, arguing that the CRM approach will turn ML into an empirical science.

It’s important to emphasize that I don’t want to “fix” Machine Learning. Instead, I want to create a field that is adjacent to ML, and stands in the same relation to ML as mathematics stands to physics. To emphasize this, I’ll write it in analogical form:

mathematics:physics   ::   machine learning:comperical science

At this point I will mention that in 2011, when I was writing the book, all of these speculations  were purely theoretical. At that time a critic could have justifiably attacked me for doing armchair philosophy and making presumptuous claims without strong evidence. But now, in 2017, comperical science is no longer a merely theoretical construct: I have been applying the philosophy to the field of NLP and have made substantial progress.

McCluskey’s first critical point is that, though protection against fraud and manual overfitting are benefits of the CRM approach, I have “exaggerated” them. This is probably a reasonable criticism; I have a bad tendency to employ absolutist vocabulary (eg “invincibly rigorous”) when describing the CRM philosophy.

However, I stand by the claim that the CRM provides very strong protection against the most common types of both honest and dishonest scientific mistakes, especially those related to statistical errors. Furthermore, this protection is urgently needed by several fields of science, such as nutrition, medicine, psychology, and economics. Consider the following commentary by leading experts in a variety of fields:

Andrew Gelman:

[M]any published papers are clearly in error, which can often be seen just by internal examination of the claims and which becomes even clearer following unsuccessful replication…

When seemingly solid findings in social psychology turn out not to replicate, we’re no longer surprised….

Paul Romer:

For more than three decades, macroeconomics has gone backwards. The
treatment of identification now is no more credible than in the early 1970s…

A parallel with string theory from physics hints at a general failure mode of science that is triggered when respect for highly regarded leaders evolves into a deference to authority that displaces objective fact from its position as the ultimate determinant of scientific truth.

John Ioannidis:

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field….

Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias.

I invite the reader to compare the degree of objectivity and rigor that must be present in comperical science to the issues and problems indicated by the above comments. In most cases, any kind of bug or miscalculation in a data compressor will be instantly revealed when the decompressed file fails to match the original. Furthermore, exact input/output matching is not enough: in order for a result to be valuable, the encoded file must also be small in comparison to previous results. File size is easily, directly and unambiguously verifiable by basic computer commands.

McCluskey complains that the book does not discuss the distinction between lossless and lossy compression. This is a fair complaint, and it is one that I have heard from other people. In my mind the rationale is clear, but the book should include more direct statements about why the CRM requires lossless compression.

The rationale is that lossless compression permits strong, objective comparisons between competing candidate theories of a phenomenon. This, in turn, permits researchers to conduct a systematic search through the space of theories, eventually homing in on a high-quality theory. Lossy compression is an interesting problem, but it does not allow strong comparisons between theories, because different theories might choose to discard different details of the original data. Without the ability to make strong comparisons, the theory-search gets stuck in the conceptual mud.

McCluskey is an investor, and so he naturally wondered about applying the CRM idea to stock market data. He concluded that the CRM was not really applicable this kind of data set. I agree, and I should have emphasized this more in the book. In general, it is very important to exercise a degree of taste and judgment when selecting the type of data set to be targeted for CRM research. If you pick a bad data set, the resulting research will fail to produce any interesting results. A good data set for CRM work should have a couple of properties:

  1. The distribution that produces the data should be essentially stationary. Otherwise, practitioners must take care to emphasize that boundary of applicability of the conclusions of the research. For example, if a CRM dataset is produced by taking images of cancerous growths in men, then the resulting knowledge should not be used to diagnose cancerous growths in women.
  2. It should be related to a phenomenon of intrinsic interest. One can imagine a line of CRM research that attempts to compress large database of cat pictures. Such research would produce a highly detailed computational description of the visual appearance of felines – an achievement of somewhat dubious intellectual value.
  3. It should have rich, organic structure. A database of random noise is a poor choice, because noise cannot be compressed. A database of numbers produced by a computer’s pseudo-random number generator is also a poor choice; it can be very highly compressed, but only by reverse-engineering the PRNG.

Stock market data runs into problems with properties #1 and #3 above. First, it is fundamentally time-dependent and not stationary. As McCluskey notes, a database of stock market data from the years 1995-2015 will be strongly influenced by the bubble. In many ways, the bubble was a unique economic event, and so knowledge of the conditions that it produced will not be helpful in predicting future trends.

Secondly, changes in stock prices are intrinsically hard to predict, because the present price should reflect almost all the available information the market has to evaluate the stock. This is called the Efficient Market Hypothesis. Actually, one interesting reformulation of the EMH is that the stream of stock price changes is random and thus incompressible.

One final point McCluskey raises is about the question “what is intelligence?” and how comperical science relates to natural and artificial intelligence. The book only hints at the answer. In the final thought experiment, the fictional protagonist compiles an enormously large and varied database of images, text, audio, and scientific measurements. Then he begins a CRM inquiry targeting this database, using a suite of extremely abstract, general, and meta-theoretical techniques. I believe a suite of techniques such as this, if sufficiently powerful, would be equivalent to intelligence. However, I do not believe this to be within the reach of modern AI research.

One of the key problems with developing general, abstract, and meta-theoretical techniques is that they are hard to evaluate. It is hard enough to formulate technical problems well, and formulating technical meta-problems is much harder. A slight glitch in the problem formulation can make the challenge impossible on one side, or trivial on the other. One of the goals of comperical philosophy is to provide a framework within which researchers can scale up the power of their abstractions and general-purpose techniques, while always staying on firm methodological ground. Consider the following “road map” of development in comperical linguistics:

  • First, develop a good specific theory of English text
  • Next develop a good specific theory of Chinese text. Then Russian text. Then Hindi. Then French. Presumably, each step will be easier than the last, as researchers learn new tricks, techniques, and concepts.
  • Finally, develop a unified theory of natural language text: a general purpose algorithm that, when invoked on a large corpus of text in a given language, will automatically produce a good specific theory of the language.

You can imagine an analogous process for image understanding: first develop a good specific theory of cars, then for buildings, then for plants, and so on. Then develop a unified theory and learning algorithm that automatically builds specific theories, given a good data set.

The final leap is to unify the general theories for the various domains. Instead of one general algorithm for language learning, another for image learning, and a third for motor skills learning, construct a truly general purpose algorithm that can be applied to any type of data set.

Thanks again to Peter McCluskey for his review, and to you, the reader, for taking the time to read my response.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s