All posts by kalx

And now for something completely different

Working on huge data at Bloomberg these days. After a long day of trying to figure out how they do that I like to think about math.

My attempt at working up to “A monad is just a monoid in the category of endofunctors.” It gets messy when using `SelectMany` in LINQ. Google Bart De Smet. Or maybe you can now just ask Cortana. Technology moves fast these days.

There is no Royal Road. It is still fun to use the grey matter sitting on top of a primate brain in the meantime.

There is no royal road.

I confess I binged this instead of googled it:

I misremembered it as something Gauss said, not Euclid. Feeble human brains still need to be reminded of things learned over 2000 years ago. At least mine does. There is nothing I can add to Dijkstra’s observations other than an anecdote.

Last Thursday I had dinner at the Columbia Faculty House. My friend is taking a break from a course he has been teaching and I offered to fill in. And found out there is a 1000ft elevation change between Morningside Ave and Morningside Dr. Next time I’ll take the 1 train instead of the B line unless I want the exercise.

One of the professors at dinner paid me the nicest complement, after drawing it out a bit. The conversation went something like this:

Me: Hi, I’m Keith Lewis.
Him: <Russian accent>I have heard your name before.</Russian accent>
Me: <Getting Nervous>Oh?</Still Nervous>
Him: <RA>People have told me things about you.</RA>
Me: <CMA?>I’m sure some of those things are probably true.</CMA?>
Him: They have said they have taken your course before and found it very difficult.
Me: <Keeping my mouth shut/>
Him: Then they realized they could do the homework. They also told me they found what you taught them was useful for their job.

Bingo! It is the same thing every semester. Students complain the material is too hard. I go over the previous homework at the beginning of each class and ask them what else they needed to be successful doing it. The answer is always the same: Nothing.

There are some things that can be learned by having your hand held and walking you down the garden path. (How do I get the Visual Studio debugger to start Excel and load an add-in?) But you can only learn the important and truly useful things by spending some time in the limbo of uncomfortable uncertainty. If your brain isn’t hurting, you aren’t learning anything new.

A dead, but not forgotten, man just taught me a new vocabulary word: pleonastic. I need to try and endeavor to avoid and eschew pleonasm.

Protecting your brand.

I like to ignore advice people give me and figure things out for myself. It does me no good. My brand is I don’t give a fuck about my brand. If you find the code I write useful and can point out things I’m missing that can help me improve my open source code you will have my full attention. The sad fact of open source is that people who actually spend the time to look carefully at your code are as rare as rocking horse shit.

Bill Gates might not have been far off when he claimed secretaries would be writing code using some version of BASIC. Larry Wall came up with perl, Guido von Rossum came up with python. I am being politically incorrect, it is a form of censorship that I abhor,  but these are all languages for people who can’t understand the complexity of modern computing.

That is not a bad thing, I just call it as I see it. The technology world moves fast these days. Companies have to hire people that can produce things according to their limited ability.

The C++ world seems to have noticed this and is making big strides these days on making “easy things easy,” according to Bjarne Stroustrup’s current mantra.

C++ is anything but easy. But somehow it seems to attract the smart people in this world.

Nothin’ but INet

If you have Excel 2013 you might know about the WEBSERVICE function. Give it a URL and it returns the content as a string. You can use FILTERXML on the result and a dose of XPath to extract what you are interested in if it happens to be XML.

You can’t make this shit up:

NOTE: The WEBSERVICE function is not available in Excel Online.

Ha! Excel Online is pretty limited. Here is my stupid pet trick trying to figure out how to get that to do something useful: ExcelUDF.

I put together xllinet which provides more functionality than these two functions and works in all versions of Excel. (At least back to Excel 97.) It implements some of the WinINet library. The function INET.URL calls InternetOpenUrl and reads the result into a string. It returns a handle to the string object instead of a string. Excel cuts you off at 32K characters, the handle gives you access to arbitrarily large strings. Even in old versions of Excel with 255 character limits.

You can peek at the result with STR.GET. That way you have a chance at figuring out what you got back. XML, JSON, CSV, or a mess. Try doing that on the result from If it is XML, you can pass that to FILTERXML. Charles Goldfarb notwithstanding, I’m happy to let XML aficionados put themselves between angle brackets and flog themselves with attributes all day long. The smart people these days seem to be returning things in JSON. Have a gander at for more JSON than you can shake a stick at.

There are a lot of parsers out there, but I rolled my own. JSON maps quite naturally to a two column recursive Excel OPER. Excel is basically 2-D, but it is possible to accommodate recursive data structures like JSON. JSON objects are key-value pairs, where the values can be a single value, an array of values, or another JSON object. Single values are either a string, number, Boolean, or null.

Both arrays and objects are OPER ranges. Use RANGE.GET on the values returned to spelunk the object. Any key that starts with an asterisk has as its value a handle to either and array or object. Keep using RANGE.GET to dig deeper.

There are two cell menu items to help you with that. ‘Adjust’ works with any array function. It figures out the size needed and plops it in. ‘Expand’ gives you a peek at the handle value. It places the range returned just to the right of the cell containing the handle and selects it so you can delete it once you’ve seen it, just like ‘Adjust’.


A quote by my mathematical hero John von Neumann

Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.

Fat Tuesday is coming up and I got that out of the way: xllrandom.
It is an update using the latest xll8 library that is a faithful reflection of the latest C++ standard. It won’t win any UX prizes, but it makes it possible to play with that in Excel.
If you can’t be bothered to download and compile the code, use this.

Hooking up C++ to Excel

See xllrandom for the gory details.

Hooking up the latest C++ version of the <random> library helped me learn a few new tricks. No more variate_generator in C++, things just keep getting simpler and better.

To generate random variates using a distribution and an engine all you need to do is type dist(eng) where dist is a distribution and eng is an engine. That is what this does, but it was not trivial to leverage the latest C++ goodness to make that happen.

C++ is strongly typed and distributions and engines all have different types. Base classes that use the NVI idiom, polymorphism, and a clever substitute for std::integer_sequence I swiped from stackoverflow did the trick.

I work hard to be lazy and also had to solve the problem of distributions that take a variable number of parameters. I used the X macro trick for that, incorporating Andrei Alexandrescu’s suggestion.

RIP Dr. Dobbs. It is a new world now that is difficult to keep up with. But fun. What a great time to be alive and have the opportunity to learn about all the amazing advances that are happening now.

Pop Quiz

The following multiple choice problems have the possible answers:
a) the code will not compile
b) the answer is undefined
c) the code will execute successfully
d) none of the above

Question 1. int i; assert (i == 0);
Question 2. int i; assert (i = 1);
Question 3. int& i;
Question 4. int f(int& i = 2) { return i; }
Question 5. int g(const int& i = 3} { return i; }

C++ is C++

One of my clients made fun of me recently for saying Excel is Excel. Tautologies are true statements and mathematicians have a propensity for them. What I was failing to communicate effectively was that all software tools have their limitations and you need to learn their quirks if you want to get the most out of them.

Software is written by people. Not quite a tautology these days, but close. It is still a cottage industry that requires humans with a pile of grey matter on top of a what is essentially a monkey brain to tap away at keyboards.

I learned some lessons tonight while teaching a course that uses C++ as a tool for implementing algorithms. People tend to stumble over simple things and bring preconceived notions from languages they already know. They get frustrated when C++ doesn’t work like their favorite language.

Programming languages are not religions, they are tools to help you achieve your ends. Learn your tools, don’t be one.

There is no substitute for spending time learning your tools. If you know nothing about C++, Programming – Principles and Practice Using C++ is a good place to start. If you know some C++ and want to get up to speed with the latest goodness in C++2011, consult Elements of Modern C++ Style. The C++ glossary is a handy resource also straight from the horses mouth.

; — Semicolon
By far the most common mistake I see is people forgetting semicolons. They end statements in C++. C++ is space insensitive, so a newline (or tab, or two or more consecutive spaces) provide no information to the compiler. Semicolons tell the compiler to do what you tell it to do. x = x + 1; increments x by 1. Semicolon means ‘do it’.

As a mathematician, that is like fingernails scratching on a blackboard. Why not x <- x + 1;? Anyone can propose a change to C++ at And they do.

{ } — Curly Braces
Curly braces are used for several different things in C++. They can be used in several different ways so pay attention!
They are used with classes and structs to group data and functions, in which case the closing curly brace must be followed with a semicolon.
They are also used with functions for the body of code. These don’t need to end with a semicolon.
Another use is for blocks of expressions associated with control flow structures: if (e) { ...; } else { ...; } If the block contains only one statement the braces are optional, but go ahead and use them anyway. It will help prevent bugs if you or someone else decides to add another statement to the block later.
Namespaces are enclosed in curly braces too. No need for a semicolon afterwards.
Finally, they are also used for uniform initialization — a very handing thing to make yourself familiar with.

See also the previous blog post.

( ) — Parentheses
These are used when declaring or defining a function and when calling a function. The difference is that when declaring or defining a function you have to tell the compiler the types of the function arguments, whereas when you call the function you only need to pass the values. Also, don’t forget you can declare (functions and data) as many times as you want, but only define them once. See the one definition rule

[ ] — Square Brackets
Of course you know these are for indexing into arrays. Maybe you didn’t know the stupid C trick that 2["unix"] evaluates to the character 'i'. (The reason is that a[b] is converted to *(a + b) by the compiler.) They are also used to introduce lambda expressions since C++2011. When you type [](int i) { return 2*i; } the compiler generates a class with an overloaded operator(). E.g.,

class lambda_funny_name_you_cannot_know {
    int operator()(int i) { return 2*i; }

But what if you need to use some variables that are in scope at the time the function is created? Just list those inside the square brackets. By default, the values are copied into corresponding members of the generated class. It is possible to pass them by reference, but don’t do that. Non-local side effects make it more difficult to reason about code correctness.

*,& — pointers and references
Given a value like int i{42}; // <-- don't forget the semicolon! the ampersand operator, & gives you the address of the value i. This address is called a pointer. You can capture the pointer with int* pi = &i;. Given a pointer you can get the value back by dereferencing the pointer: assert (*pi == 42);.
A reference is just an alias: int& ri{i};. If you execute, e.g., ri = 24;, then i now has the value 24, not 42. If the reference is const, e.g., const int& cri{i} and you try to change the value through the reference, cri = 24; you will get a compiler error.

r-values and l-values
An l-value is a value that has a name associated with it and an r-value is a value without a name associated with it, e.g., a temporary value. If int i{1},j{2}; then i + j has the value 3, but no name, hence it is an r-value. The name comes from the fact l-values typically occur on the left hand side of assignments and r-values occur on the right hand side, int k; k = i + j;


Right curly brace. It is the most powerful feature of C++.
I’m only half kidding, like I always am. Even now. Garbage collected languages have a big dumb machine following you around trying to figure out when you are done using the memory you allocated. When you think your code should be executing, the hulking monster you have no control over in your footsteps might decide it is time to take over the CPU.
As Bjarne Stroupstrup said, the best language for garbage collection is a language that doesn’t need it. C++ gives you control over the resources your program uses. I’ve never really understood RAII: Resource Acquisition Is Initialization. To me it is just a way of making C++ classes behave like built in types. Constructors do the dirty work of allocating memory and putting 0’s and 1’s in place, destructors undo that, copy construction duplicates the bits, and copy assignment clears out existing bits before duplicating the right hand side of operator=().
There are two new ways to help deal with C++’s propensity to create temporary objects, aka r-values, but this post is about l-value lifetimes.
An l-value is something that has a name for the memory associated with it. Maybe I should give mad props to left curly brace too. They introduce the scope that objects live in. Write the class and give it a name and the constructor brings it into existence. The right curly brace ensures the destructor gets called on the object.

It is all under your control in C++ if you just spend some time learning how it works.

I keep forgetting…

…that not everyone knows what I know. Knowledge is hard won, but somehow after you know something it seems completely obvious. Even though it is not.

Some guy using my xll library kept pestering me to debug his code. It was clear to me he needed to invest more time learning C++. One dirty secret about open source software is that most people only complain about the problems they encounter. I am fine with that as long as they are pointing out shortcomings in my code. Not so much when they think I’m a free resource to fix their crappy code that has nothing to do with my library.

I’ve been lucky to have a couple of people that actually provided code I could use, but that seems to be as rare as rocking horse shit. I learned a lesson today from the guy that was calling me names for telling him he might not want to use my library.

C++ is hard. Really hard. But some fascinating things are going on in that world now. Some of the smartest people in the world have been making it easier to use. Google Bjarne Stroustrup and just listen to what he has to say. The world needs more people like him.

I’m teaching a course that uses C++ and it is still the case that people have trouble with just compiling and running a program. It is the price you pay for using a language that was designed to make it possible to get the maximum performance out of the latest silicon. It’s bits all the way down.

Some people say C is high level assembly language. They are right. That is what Dennis Ritchie and Brian Kernighan wrote it for. Hardware was changing in bigger directions in their day than now. It was a total PITA to write a routine to, e.g., sort an array of numbers and have to keep rewriting it in the latest set of low level assembly instructions. Say a prayer tonight for the compiler writers who insulate you from having to understand such low level details.

The first stage is turning the characters you type as a C program into assembly code. The compiler that some very clever people wrote for you turns that into an object file that the machine you are working on can use. Don’t confuse that with object oriented programming which came much later. It is just a file of bits that the hardware can ultimately execute.

These object files can be collected into a library and be used by other programs. Static libraries (.lib or .a) copy the bits into the program you write. Shared libraries (.dll or .so) share those bits with all programs executing on your computer.

The second stage is to link together object files (I’m lying a little, there are more stages) and provide a well known entry point to where the operating system can start doing exactly what you tell the computer to do.
To make something happen on your computer you have to execute a file. Sounds harsh, but these are names people called things before the age of political correctness.

The xll library it is a little bit different. Excel is already running. When you open an xll, which is just a dll, it looks for a function called xlAutoOpen and executes that. Set a breakpoint on it and step through all the tricky Excel SDK stuff you don’t have to worry about in order to call C/C++/Fortran.

I’ve updated to work with 64-bit Excel if you are following along with the latest. Go to Build/Configuration Manager… and set the platform to x64. Let me know if that is not working for you now, as long as you understand this is not an invitation to debug your code.