Nothin’ but INet

If you have Excel 2013 you might know about the WEBSERVICE function. Give it a URL and it returns the content as a string. You can use FILTERXML on the result and a dose of XPath to extract what you are interested in if it happens to be XML.

You can’t make this shit up:

NOTE: The WEBSERVICE function is not available in Excel Online.

Ha! Excel Online is pretty limited. Here is my stupid pet trick trying to figure out how to get that to do something useful: ExcelUDF.

I put together xllinet which provides more functionality than these two functions and works in all versions of Excel. (At least back to Excel 97.) It implements some of the WinINet library. The function INET.URL calls InternetOpenUrl and reads the result into a string. It returns a handle to the string object instead of a string. Excel cuts you off at 32K characters, the handle gives you access to arbitrarily large strings. Even in old versions of Excel with 255 character limits.

You can peek at the result with STR.GET. That way you have a chance at figuring out what you got back. XML, JSON, CSV, or a mess. Try doing that on the result from If it is XML, you can pass that to FILTERXML. Charles Goldfarb notwithstanding, I’m happy to let XML aficionados put themselves between angle brackets and flog themselves with attributes all day long. The smart people these days seem to be returning things in JSON. Have a gander at for more JSON than you can shake a stick at.

There are a lot of parsers out there, but I rolled my own. JSON maps quite naturally to a two column recursive Excel OPER. Excel is basically 2-D, but it is possible to accommodate recursive data structures like JSON. JSON objects are key-value pairs, where the values can be a single value, an array of values, or another JSON object. Single values are either a string, number, Boolean, or null.

Both arrays and objects are OPER ranges. Use RANGE.GET on the values returned to spelunk the object. Any key that starts with an asterisk has as its value a handle to either and array or object. Keep using RANGE.GET to dig deeper.

There are two cell menu items to help you with that. ‘Adjust’ works with any array function. It figures out the size needed and plops it in. ‘Expand’ gives you a peek at the handle value. It places the range returned just to the right of the cell containing the handle and selects it so you can delete it once you’ve seen it, just like ‘Adjust’.



A quote by my mathematical hero John von Neumann

Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.

Fat Tuesday is coming up and I got that out of the way: xllrandom.
It is an update using the latest xll8 library that is a faithful reflection of the latest C++ standard. It won’t win any UX prizes, but it makes it possible to play with that in Excel.
If you can’t be bothered to download and compile the code, use this.

Hooking up C++ to Excel

See xllrandom for the gory details.

Hooking up the latest C++ version of the <random> library helped me learn a few new tricks. No more variate_generator in C++, things just keep getting simpler and better.

To generate random variates using a distribution and an engine all you need to do is type dist(eng) where dist is a distribution and eng is an engine. That is what this does, but it was not trivial to leverage the latest C++ goodness to make that happen.

C++ is strongly typed and distributions and engines all have different types. Base classes that use the NVI idiom, polymorphism, and a clever substitute for std::integer_sequence I swiped from stackoverflow did the trick.

I work hard to be lazy and also had to solve the problem of distributions that take a variable number of parameters. I used the X macro trick for that, incorporating Andrei Alexandrescu’s suggestion.

RIP Dr. Dobbs. It is a new world now that is difficult to keep up with. But fun. What a great time to be alive and have the opportunity to learn about all the amazing advances that are happening now.

Pop Quiz

The following multiple choice problems have the possible answers:
a) the code will not compile
b) the answer is undefined
c) the code will execute successfully
d) none of the above

Question 1. int i; assert (i == 0);
Question 2. int i; assert (i = 1);
Question 3. int& i;
Question 4. int f(int& i = 2) { return i; }
Question 5. int g(const int& i = 3} { return i; }

C++ is C++

One of my clients made fun of me recently for saying Excel is Excel. Tautologies are true statements and mathematicians have a propensity for them. What I was failing to communicate effectively was that all software tools have their limitations and you need to learn their quirks if you want to get the most out of them.

Software is written by people. Not quite a tautology these days, but close. It is still a cottage industry that requires humans with a pile of grey matter on top of a what is essentially a monkey brain to tap away at keyboards.

I learned some lessons tonight while teaching a course that uses C++ as a tool for implementing algorithms. People tend to stumble over simple things and bring preconceived notions from languages they already know. They get frustrated when C++ doesn’t work like their favorite language.

Programming languages are not religions, they are tools to help you achieve your ends. Learn your tools, don’t be one.

There is no substitute for spending time learning your tools. If you know nothing about C++, Programming – Principles and Practice Using C++ is a good place to start. If you know some C++ and want to get up to speed with the latest goodness in C++2011, consult Elements of Modern C++ Style. The C++ glossary is a handy resource also straight from the horses mouth.

; — Semicolon
By far the most common mistake I see is people forgetting semicolons. They end statements in C++. C++ is space insensitive, so a newline (or tab, or two or more consecutive spaces) provide no information to the compiler. Semicolons tell the compiler to do what you tell it to do. x = x + 1; increments x by 1. Semicolon means ‘do it’.

As a mathematician, that is like fingernails scratching on a blackboard. Why not x <- x + 1;? Anyone can propose a change to C++ at And they do.

{ } — Curly Braces
Curly braces are used for several different things in C++. They can be a bit clumsy and error prone since they are used in several different ways so pay attention!
They are used for classes and structs to group data and functions, in which case the closing curly brace must be followed with a semicolon.
They are also used with functions to define the body of code that they execute. These don’t need to end with a semicolon.
Another use is for blocks of expressions associated with control flow structures: if (e) { ...; } else { ...; } If the block contains only one statement the braces are optional, but go ahead and use them anyway. It will help prevent bugs if you or someone else decides to add another statement to the block later.
Namespaces are enclosed in curly braces too. No need for a semicolon afterwards.
Finally, they are also used for uniform initialization — a very handing thing to make yourself familiar with.

See also the previous blog post.

( ) — Parentheses
These are used when declaring or defining a function and when calling a function. The difference is that when declaring or defining a function you have to tell the compiler the types of the function arguments, whereas when you call the function you only need to pass the values. Also, don’t forget you can declare (functions and data) as many times as you want, but only define them once. The one definition rule

[ ] — Square Brackets
Of course you know these are for indexing into arrays. Maybe you didn’t know the stupid C trick that 2["unix"] evaluates to the character 'i'. (The reason is that a[b] is converted to *(a + b) by the compiler.) They are also used to introduce lambda expressions in C++2011. When you type [](int i) { return 2*i; } the compiler generates a class with an overloaded operator(). E.g.,

class lambda_funny_name_you_cannot_know {
    int operator()(int i) { return 2*i; }

But what if you need to use some variables that are in scope at the time the function is created? Just list those inside the square brackets. By default, the values are copied into corresponding members of the generated class. It is possible to pass them by reference, but don’t do that. Non-local side effects make it more difficult to reason about code correctness.

*,& — pointers and references
Given a value like int i{42}; // <-- don't forget the semicolon! the ampersand operator, & gives you the address of the value i. This address is called a pointer. You can capture the pointer with int* pi = &i;. Given a pointer you can get the value back by dereferencing the pointer: assert (*pi == 42);.
A reference is just an alias: int& ri{i};. If you execute, e.g., ri = 24;, then i now has the value 24, not 42. If the reference is const, e.g., const int& cri{i} and you try to change the value through the reference, cri = 24; you will get a compiler error.

r-values and l-values
An l-value is a value with a name associated with it and an r-value is a value without a name associated with it, e.g., a temporary value. If int i{1},j{2}; then i + j has the value 3, but no name, hence it is an r-value. The name comes from the fact l-values typically occur on the left hand side of assignments and r-values occur on the right hand side, int k; k = i + j;


Right curly brace. It is the most powerful feature of C++.
I’m only half kidding, like I always am. Even now. Garbage collected languages have a big dumb machine following you around trying to figure out when you are done using the memory you allocated. When you think your code should be executing, the hulking monster you have no control over in your footsteps might decide it is time to take over the CPU.
As Bjarne Stroupstrup said, the best language for garbage collection is a language that doesn’t need it. C++ gives you control over the resources your program uses. I’ve never really understood RAII: Resource Acquisition Is Initialization. To me it is just a way of making C++ classes behave like built in types. Constructors do the dirty work of allocating memory and putting 0’s and 1’s in place, destructors undo that, copy construction duplicates the bits, and copy assignment clears out existing bits before duplicating the right hand side of operator=().
There are two new ways to help deal with C++’s propensity to create temporary objects, aka r-values, but this post is about l-value lifetimes.
An l-value is something that has a name for the memory associated with it. Maybe I should give mad props to left curly brace too. They introduce the scope that objects live in. Write the class and give it a name and the constructor brings it into existence. The right curly brace ensures the destructor gets called on the object.

It is all under your control in C++ if you just spend some time learning how it works.

I keep forgetting…

…that not everyone knows what I know. Knowledge is hard won, but somehow after you know something it seems completely obvious. Even though it is not.

Some guy using my xll library kept pestering me to debug his code. It was clear to me he needed to invest more time learning C++. One dirty secret about open source software is that most people only complain about the problems they encounter. I am fine with that as long as they are pointing out shortcomings in my code. Not so much when they think I’m a free resource to fix their crappy code that has nothing to do with my library.

I’ve been lucky to have a couple of people that actually provided code I could use, but that seems to be as rare as rocking horse shit. I learned a lesson today from the guy that was calling me names for telling him he might not want to use my library.

C++ is hard. Really hard. But some fascinating things are going on in that world now. Some of the smartest people in the world have been making it easier to use. Google Bjarne Stroustrup and just listen to what he has to say. The world needs more people like him.

I’m teaching a course that uses C++ and it is still the case that people have trouble with just compiling and running a program. It is the price you pay for using a language that was designed to make it possible to get the maximum performance out of the latest silicon. It’s bits all the way down.

Some people say C is high level assembly language. They are right. That is what Dennis Ritchie and Brian Kernighan wrote it for. Hardware was changing in bigger directions in their day than now. It was a total PITA to write a routine to, e.g., sort an array of numbers and have to keep rewriting it in the latest set of low level assembly instructions. Say a prayer tonight for the compiler writers who insulate you from having to understand such low level details.

The first stage is turning the characters you type as a C program into assembly code. The compiler that some very clever people wrote for you turns that into an object file that the machine you are working on can use. Don’t confuse that with object oriented programming which came much later. It is just a file of bits that the hardware can ultimately execute.

These object files can be collected into a library and be used by other programs. Static libraries (.lib or .a) copy the bits into the program you write. Shared libraries (.dll or .so) share those bits with all programs executing on your computer.

The second stage is to link together object files (I’m lying a little, there are more stages) and provide a well known entry point to where the operating system can start doing exactly what you tell the computer to do.
To make something happen on your computer you have to execute a file. Sounds harsh, but these are names people called things before the age of political correctness.

The xll library it is a little bit different. Excel is already running. When you open an xll, which is just a dll, it looks for a function called xlAutoOpen and executes that. Set a breakpoint on it and step through all the tricky Excel SDK stuff you don’t have to worry about in order to call C/C++/Fortran.

I’ve updated to work with 64-bit Excel if you are following along with the latest. Go to Build/Configuration Manager… and set the platform to x64. Let me know if that is not working for you now, as long as you understand this is not an invitation to debug your code.

It’s not your default Will

The xll library uses C++, which has a mechanism for default arguments, but the default value for an add-in is not a C++ default. No, really. It’s not your (C++) default Will.

The fourth argument to Args::Arg allows default arguments to be specified, but only if you use some funny macros.
Excel has built-in functionality to paste in argument names when you type Ctrl-Shift-A after typing the name of a function. If you type the name of a function in a cell (with no arguments or parentheses) and hit Return you will see a number show up. It is the register id Excel assigns to the function. If the cursor is in that cell and you type Ctrl-Shift-B a macro I wrote will fix up the cell to call the function with the default arguments you specify. It pastes the default arguments in the cells below the function. More obscure than Ctrl-Shift-A but more useful if you are a power user that wants to quickly cook up a spreadsheet.

But wait, that’s not all! If you type Ctrl-Shift-C it will create named ranges based on the variable names and use those in the fixed up function. Don’t do this in column A since the names get pasted into the column to the left of the selected cell.
If you pull this trick twice in a sheet there is a problem: the named ranges get clobbered. The solution is to put a prefix to the left of the cell containing the register id. The named ranges that get created will use that prefix. Just be sure to use unique prefixes.


I was trying to move forward with the new installer for the xll8 library. Again. WiX works, but I’m trying to keep up with new technology. I had high hopes for NuGet for native code, but that project seems to be going nowhere. So much for trying to help out on that. No e-mail address, just a twitter feed. I started signing up for that. First they made me select 5 people twitter thought I might know that had active twitter feeds. Then 5 more people they thought I might like to follow. Took a pass on Justin Egger, but I was good with Jimmy Fallon, Louis CK, and Seth Meyers. Then they wanted me to cough up my address book. Buh, bye. But that didn’t stop them from sending robo-emails.

Category theory has been an ongoing hobby for me, much to the chagrin of my Operator Theory and Functional Analysis colleagues. They think it is generalized abstract nonsense. Pot, kettle.

Actually knowing what a monad is seems to be a hindrance to understanding how computer scientists use them. “Monads reify composition of functions.” At least that is my Dao De Jing definition – if you know everything there is to know about monads, that should make perfect sense to you.

I like examples, but not the examples I’ve been seeing in most monad expositions. I make a living solving practical problems and never came across anything that looked immediately useful to me.

Here is an example that is almost useful: suppose you have an old-school C library that returns error codes that you have to manually check and functions that return results using pointers. Since int is shorter to type than double, lets assume the functions in the library have signature int (*)(int, int*). Or in modern C++, std::function<int(int,int*)>, a function that returns an int error code, takes an integer argument, and stores the integer result where the pointer tells it to.

I’m thinking GSL. They have a lot of std::function<int(double,double*)> functions. I have been plugging some of this stuff into Excel: xllgsl.

Excel is purely functional language, except for macros used to produce side effects. So is Haskell. Brilliant stuff, but I wish they would stop telling people how easy it is. They make fun of C++ in the same breath they use to claim how well metatemplate programming maps to Haskell. Thank the smart C++ people that put many more man-hours into solving really important issues in modern computing. Like r-value references that obviate the need for meta-template programming and let you return big objects by value without a performance hit.

Off my soap box. Back to monads. They have limited usefulness. They are just a programming pattern that involves wrapping up an object and calling a function on the wrapped up object.

They make standard conventions slightly more convenient. E.g., instead of

int err;
int x, y, z;
x = 0;
err = f1(x, &y);
if (err != 0) {
   // deal with it
err = f2(y, &z);
if (err != 0) {
    // deal with it

you can wrap up the error code and argument

struct wrap {
    int err;
    int val;
    wrap(int x) : err(0), val(x)

and lift a GSL style function:

std::function<wrap(wrap)> lift(std::function<int(int,int*)> f) {
    return [f](wrap x) { 
        wrap y;
        if (x.err)
            y.err = x.err;
            y.err = f(x.val, &y.val);

        return y;

Now you can compose functions that return error codes:

    lift F1(f1);
    lift F2(f2);
    auto y = F2(F1(wrap(x));

Those big, bad monads don’t look so tough anymore.

The big boys call wrap unit and write unit: a -> M a. The signature for lift is lift: (a -> b) -> (M a -> M b). The Haskell guys are so smart that they call unit return and use bind: M a -> (a -> M b) -> M b instead of lift (a.k.a., fmap). The language is so easy you have to do something to make it complicated.