The journey to greatness starts with a simple
step. Just turn the envelope and write on its back.
The professor of machine learning asked the
student, “Luke let’s begin with learning some simple ML [Machine Learning] techniques. What’s the
probability of throwing a 5 with two dice?”
Luke grinned and went, “OK professor, trick
question. Hmmm…let me guess…it’s gonna be like, 2 over 36?” belched out Luke
with a smirk.
“OK smartass. How did you calculate that?”
went Professor Stan.
“Professor, I’m a genius. Of course you knew
that right? It’s the number of ways you can get a five divided by the total
number of combinations of numbers 1 through 6…” he trailed off and realized how
wrong he had been. A wave of ash swept over his face as he whitened.
“It’s alright
Luke; everybody makes the same mistake. I used to forget one dice and say 1
over 36 with one die” Stan grinned.
The class erupted in laughter, though not everyone got the joke.
Stan looked
at the class and smiled his satisfied smile. He knew not everyone got it and
wasn’t going to let go that easily today.
“So, class,
we’re going to get to the bottom of this, OK? I think we all know probability
is the chance of getting your favorable outcomes out of a total number of
probable outcomes. I usually remember this through “FOTO”, that is, the ratio of number of Favorable Outcomes over Total possible Outcomes. In
this case, the total number of outcomes of numbers is as follows:
{1, 1}, {1, 2}, {1, 3}, {1, 4}, {1, 5}, {1,
6},
{2, 1}, {2, 2}, {2, 3}, {2, 4}, {2, 5}, {2,
6},
{3, 1}, {3, 2}, {3, 3}, {3, 4}, {3, 5}, {3,
6},
{4, 1}, {4, 2}, {4, 3}, {4, 4}, {4, 5}, {4,
6},
{5, 1}, {5, 2}, {5, 3}, {5, 4}, {5, 5}, {5,
6},
{6, 1}, {6,
2}, {6, 3}, {6, 4}, {6, 5}, {6, 6}, and everything else is a repeat for two
identical dice. That is, though the 2nd dice could have the same
numbers with the 1st dice assuming the numbers 1 through 6, the
total number of outcomes (TO) would really not change and repeat.
Therefore TO
= 36.
Now what is the
number of favorable outcomes (FO)? What are all the number of ways we can throw
a 5? FO = {1, 4}, {2, 3}, {3, 2}, {4, 1} è 4.
Therefore the
probability of throwing a total 5 è FO/TO è 4/36 è a 1 in 9 chance.
“Now class,
did that make sense?” he saw a majority of heads nodding and some still had
their eyes open like deer. He was going to let that go and made eye contact
with one of the deer, “sometimes you just need to sit alone and write it all
out. Guarantee you you’ll get it” and smiled his wide comforting smile. The
deer looked up and closed its eyes in agreement.
Stan turned
around and looked at the board, “OK now, what if I wanted my machine to figure this out on its own?”
He turned back with a quick swish, his eyebrows perked up with a slight frown
of curiosity.
“While some
of it is rocket science, it’s actually not that hard. I’ve got one word for
you; actually, make it two – pattern
recognition. Machines learn by recognizing patterns. And who makes them
learn patterns?” he was now pointing to Heart with his sharp index. Heart
looked up, her dark shaded brown hair flowing around her sides, wave after wave
sloshing against her brownish yellow skin of her tall cheeks.
“You mean us?
humans?” her eyes were suddenly large concentric balls of white with glittering
black. Heart Soledad was Chief Marketing Officer at Milky Comp, a global leader
with the slogan “offering solutions for
creation” – primarily scientific and mathematical software for industries
like semiconductors, automotive, food and beverage, packaging, aerospace,
agriculture, real estate and others. She was forty five and had graduated from
Harvard ten years ago, majoring in public administration in international
development with grounding in basic economics. Her favorite though was
behavioral influences in economics and she was a powerhouse executor of this
thinking.
Professor
Stan looked at Heart with a jubilant look “Yes! Thanks Heart. My heartfelt
thanks” he smiled and continued. Heart smiled back, her bright teeth a
beautiful contrast to her Afro-Cuban complexion; she liked Stan and felt a
slight attraction and quickly shrugged off the feeling.
“So, ladies
and gentlemen, we can make computers learn, but WE need to make them recognize
patterns. Let’s see how this works with the dice” he was looking at Hex now
firmly. Hex nodded. Henry Excelsior aka Hex was Chief Strategy Officer at Milky
Comp, a Chicago graduate in economics and finance, with a mechanical engineering
major. He was a hardcore finance strategist but had an unbounded curiosity
about the world, which had thrust him into this role when he had just turned
fifty earlier in the year.
“Hex, I want
you to be my guinea pig. What would you do if you wanted to make a machine
learn how to calculate these probabilities?”
“I knew you’d
pick on me Dr. Stan” Hex grinned and continued; Stanford and Chicago were
healthy rivals for economics and finance.
“I would make
the computer generate two sets of random whole numbers between 1 and 6 for
about 100 times. Then take a number of times the total came out to 5 and divide
by the number of times the numbers were generated” he stopped for a breath.
“This ratio should then technically be close to the fraction 1/9 or approximately
11% like you calculated”.
“Bingo! Nice
job Hex. Class, do you see the logic behind this and trying to generate the relative frequency of occurrences of the
5 on the machine? The same logic can now be used to generate various patterns
and calculate various probabilities, right?” Stan’s eyebrows were raised again.
“So
ultimately it’s all about teaching the machine to recognize patterns in the
data that we’re feeding it. Questions, bouquets, brickbats?” he lisped and
smiled. There were several heads nodding and a few just stared off into space,
expressionless.
“Prof. Stan –
can you show us how this would work exactly, like you did with looking at all
the combinations of dice throwing possible?” this was Luke, chiming in with
feigned curiosity.
“Aha!” went
Stan. “I suggest you use a random number generator in Excel and try it out for
yourself! Sometimes it’s better to roll up your sleeves and get into the
weeds.” He wasn’t going to spoon-feed every bit of this thing.
“Now that’s
machine learning folks. You’ve just learnt the first lesson: a machine learns differently. And you can’t dismiss this
simple fact. Pattern recognition is not
learning. Human learning is an inherent, imbibed trait that has a multi-dimensional
aspect to it. It’s not a linear, look at
a pattern and you know it type of thing. Learning happens with training and
observing and listening and repeating. And failing. And correcting. And
training. And on and on. It’s a continuous process with ups and downs in space
and time”.
Stan stopped and looked at the silent class gaping at him with an unexplained eerie quiet. He then smiled back at Luke and said “thanks for being the best setup in an ML class!” everyone burst into giggles and a few audible laughs, with some exclaiming “come on!”
Stan stopped and looked at the silent class gaping at him with an unexplained eerie quiet. He then smiled back at Luke and said “thanks for being the best setup in an ML class!” everyone burst into giggles and a few audible laughs, with some exclaiming “come on!”
Stan plodded on, “with the advent of calculators, laptops and the quintessential smartphone we’ve lost the ability to do ‘farmer math’ as my manager likes to call it. And farmers themselves have lost this ability, especially the ease with which to churn a few numbers mentally and do a DIMS[1] test. We’ll go through this journey by telling a few stories. Our first story is something more real than a silly dice throwing problem for Luke. It’s about crop yield, say in a corn field”.
The backstory and summary: Once a
farmer wanted to be able to quickly calculate his crop yield, given that his
people had just given him some data, like how many acres they covered that day,
the number of bushels of corn they were able to harvest, the total number of
acres that were waiting to be harvested, and some area that had been cordoned
off for another crop. With this information the farmer figured out what his
annual yield that year could be, and then think through market conditions that
would prevail later in the year. With all this data he then quickly figured out
his profitability with corn for the year, and then went on to repeat this
calculation for his other crops. And finally he figured this for his entire
portfolio of crops and with some constraints on costs and selling prices and
accounting for variations of these, he was ultimately able to see what his
annual performance was going to look like. This then made him realize he might
need to hire a couple of extra hands to support the back half of the year.
And all this was
done in about 5 minutes, on the back of
an envelope (sometimes it just rests in the cranium and never gets to the
envelope).
The basic
point of such an exercise is to get that initial feel for something, dig in,
and have the wherewithal to calculate basic numbers – this in itself is a big
differentiator between an average person who does what he’s told and a person
that stands out and is “bold” enough to say “Does It Make Sense? Does this pass my DIMS test? I’m gonna find out for
myself”.
The DIMS test has a basic VÉTUDE[2] framework embedded in it – this is something to teach kids to get to a goal quickly by doing quick calculations. This is NOT only for kids, it’s for everyone!
V – Visualize (on paper or in the mind –
whichever you’re comfortable with)
É – Extract all info – write it down so you can see it!
T – Think thru approaches
U – Understand (or get to an Understanding)
D – Double check or the quintessential
sanity recheck, and
E – Expound (explain) with a flourish!
So what math did the farmer do? The first thing he did was to visualize the problem in his mind.
V – Imagine his field was a rectangle with an area of, say, 1000 acres[3]. Sketching something like this would take a minute, and go a long way to understanding what we’re calculating.
É – let’s say they harvested about 7% of his field that day – that’s 70 acres.
If the yield % per acre is 98 bushels per acre[4] then his
total harvest from that day would be
= 98
bushels/acre * 70 acres = 6,860 bushels.
Now there’s
93% of his 1000 acres of field left for harvest. Assuming 70% of his field is
corn, that’s 700 acres. Using the above info, his total yield that year from
harvesting his full field would be = 6,860 * 10 = 68,600 bushels (assuming, of
course, that the remaining 63% yields corn at
the same rate as the first 7% harvested!).
T – thinking through what he could do with all this yield brings us to the crux of the calculation – show me the money![5] Assume the price of corn (determined, say, by a mercantile exchange like the CME[6]) is $3.75 per bushel. Then the farmer’s sales revenue (“income”) for that year from corn alone would be = 68,600 bushels/yr * $3.75/bushel = $257,250/yr (from corn alone, from 700 acres of harvest). And then let’s say he’s got to pay his farm hands, farm boys, costs of maintenance for the field, tractors, combines, backhoes and other equipment, storage costs (barn) etc. total to ~$175,000. So his profit from corn would be ~$83K for the year – this is the money that “he makes”. A similar calc for his other 300 acres (with, say, soybean) might get him another $25K of profit (assumed) from a total sale of ~$100K. Then his total earnings (profits) for the year would be ~$133K. Not a bad chunk of change but let’s not forget the intellectual and physical capacities that are expended from childhood to adulthood to create and build all this – respect!
U – the understanding arising out of all this could be several-fold viz. what it takes to maintain and run a farming operation for corn and soy, what the profit margins in such a business could be (which is, incidentally = $133K/$358K = ~37%). While the margin looks good on paper, there’s this thing called volatility that can fluctuate the exchange prices of corn which can be pretty wild, to put it mildly. A quick Google on this shows corn prices shooting up from $2/bushel to over $8/bushel from ~2007 due to various factors. When prices go up its great for margins and life in general is good. When prices crash due to a collapse in demand, margins can suffer a lot and we might be in a loss-making operation very quickly, especially as our fixed costs are in general, well, fixed! Take land maintenance, tilling etc. plus equipment that’s used (planters, combine harvesters, skid steers and what-have-you). So it’s critical to build this fundamental understanding of the situation not just one point in time, but over time and space. Its understanding at a deeper level creating the richness for further innovation and thinking of other ways to skin the cat.
D – These are basic double-checks of calculations to make sure we’re using the right data, triangulation of information by using 2 or 3 different sources etc. Once he had the double checking done, the farmer felt confident and he could plan out his crop portfolio for next year.
E – Expound or explain the calculations and the overall farmer math to someone who can be the “listener” (e.g. your manager or a peer or anyone who reports to you). This is the best way for us to understand a concept deeply; explain it to someone for crystal clear clarity, answer their questions and clarify. The best way to learn is to help others learn. It worked with Richard Feynman and it will with you as well!
“That, ladies and gentlemen, is how a BOTE’s done”, went Prof. Stan. “Basic farmer math with deep understanding of the problem at hand built in to the process. ML won’t do this for you, so this creates the foundation to feed into an ML algo.” Silence. The class erupted to a standing O.
Hex and Heart looked at each other, and thought of their problem at hand. The DIMS test, the VETUDE framework and farmer math all had great relevance for their days ahead.
[1]
DIMS: Does It Make Sense?
[2] VÉTUDE:
VÉTUDE is a play on the French word étude, meaning to study.
V – Visualize, É – Extract, T – Think, U – Understand & Calculate, D –
Double check, and E – Expound
(explain)
[3] One
acre is ~43,560 sq. ft., (~4047 square meters, 4840 square yards); the ~ or
tilde means “approximately” or “about”
[4] A
bushel is ~35 lbs of corn (MAY mean different weights for different crops!). An
acre of corn in general yields ~100 bushels
[5]
Yes the allusion to Cuba Gooding Jr. in Jerry Maguire is implicit J
[6]
CME: Chicago Mercantile Exchange