And now for something completely similar

Against my better judgement, I’ve been rewriting the Excel add-in library. After a long day dealing with C++03 limitations at my current gig, I have to confess I am really enjoying applying the latest C++14 goodness to my old code.

Don’t try this at home, kids: https://github.com/keithalewis/xll12

You will need Visual Studio 2015 and be willing to work on your modern C++ chops. Let me know what problems you run into and I’ll make my best effort to get you going. Don’t underestimate your ability to help improve this code. Clone it and send me a pull request.

That reminds me. Time for a git rant.

Advertisements

And now for something completely different

Working on huge data at Bloomberg these days. After a long day of trying to figure out how they do that I like to think about math.

https://github.com/keithalewis/papers/blob/master/ct.md

My attempt at working up to “A monad is just a monoid in the category of endofunctors.” It gets messy when using `SelectMany` in LINQ. Google Bart De Smet. Or maybe you can now just ask Cortana. Technology moves fast these days.

There is no Royal Road. It is still fun to use the grey matter sitting on top of a primate brain in the meantime.

There is no royal road.

I confess I binged this instead of googled it: http://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD841.html

I misremembered it as something Gauss said, not Euclid. Feeble human brains still need to be reminded of things learned over 2000 years ago. At least mine does. There is nothing I can add to Dijkstra’s observations other than an anecdote.

Last Thursday I had dinner at the Columbia Faculty House. My friend is taking a break from a course he has been teaching and I offered to fill in. And found out there is a 1000ft elevation change between Morningside Ave and Morningside Dr. Next time I’ll take the 1 train instead of the B line unless I want the exercise.

One of the professors at dinner paid me the nicest complement, after drawing it out a bit. The conversation went something like this:

Me: Hi, I’m Keith Lewis.
Him: <Russian accent>I have heard your name before.</Russian accent>
Me: <Getting Nervous>Oh?</Still Nervous>
Him: <RA>People have told me things about you.</RA>
Me: <CMA?>I’m sure some of those things are probably true.</CMA?>
Him: They have said they have taken your course before and found it very difficult.
Me: <Keeping my mouth shut/>
Him: Then they realized they could do the homework. They also told me they found what you taught them was useful for their job.

Bingo! It is the same thing every semester. Students complain the material is too hard. I go over the previous homework at the beginning of each class and ask them what else they needed to be successful doing it. The answer is always the same: Nothing.

There are some things that can be learned by having your hand held and walking you down the garden path. (How do I get the Visual Studio debugger to start Excel and load an add-in?) But you can only learn the important and truly useful things by spending some time in the limbo of uncomfortable uncertainty. If your brain isn’t hurting, you aren’t learning anything new.

A dead, but not forgotten, man just taught me a new vocabulary word: pleonastic. I need to try and endeavor to avoid and eschew pleonasm.

Protecting your brand.

I like to ignore advice people give me and figure things out for myself. It does me no good. My brand is I don’t give a fuck about my brand. If you find the code I write useful and can point out things I’m missing that can help me improve my open source code you will have my full attention. The sad fact of open source is that people who actually spend the time to look carefully at your code are as rare as rocking horse shit.

Bill Gates might not have been far off when he claimed secretaries would be writing code using some version of BASIC. Larry Wall came up with perl, Guido von Rossum came up with python. I am being politically incorrect, it is a form of censorship that I abhor,  but these are all languages for people who can’t understand the complexity of modern computing.

That is not a bad thing, I just call it as I see it. The technology world moves fast these days. Companies have to hire people that can produce things according to their limited ability.

The C++ world seems to have noticed this and is making big strides these days on making “easy things easy,” according to Bjarne Stroustrup’s current mantra.

C++ is anything but easy. But somehow it seems to attract the smart people in this world.

Nothin’ but INet

If you have Excel 2013 you might know about the WEBSERVICE function. Give it a URL and it returns the content as a string. You can use FILTERXML on the result and a dose of XPath to extract what you are interested in if it happens to be XML.

You can’t make this shit up:

NOTE: The WEBSERVICE function is not available in Excel Online.

Ha! Excel Online is pretty limited. Here is my stupid pet trick trying to figure out how to get that to do something useful: ExcelUDF.

I put together xllinet which provides more functionality than these two functions and works in all versions of Excel. (At least back to Excel 97.) It implements some of the WinINet library. The function INET.URL calls InternetOpenUrl and reads the result into a string. It returns a handle to the string object instead of a string. Excel cuts you off at 32K characters, the handle gives you access to arbitrarily large strings. Even in old versions of Excel with 255 character limits.

You can peek at the result with STR.GET. That way you have a chance at figuring out what you got back. XML, JSON, CSV, or a mess. Try doing that on the result from http://google.com. If it is XML, you can pass that to FILTERXML. Charles Goldfarb notwithstanding, I’m happy to let XML aficionados put themselves between angle brackets and flog themselves with attributes all day long. The smart people these days seem to be returning things in JSON. Have a gander at data.gov for more JSON than you can shake a stick at.

There are a lot of parsers out there, but I rolled my own. JSON maps quite naturally to a two column recursive Excel OPER. Excel is basically 2-D, but it is possible to accommodate recursive data structures like JSON. JSON objects are key-value pairs, where the values can be a single value, an array of values, or another JSON object. Single values are either a string, number, Boolean, or null.

Both arrays and objects are OPER ranges. Use RANGE.GET on the values returned to spelunk the object. Any key that starts with an asterisk has as its value a handle to either and array or object. Keep using RANGE.GET to dig deeper.

There are two cell menu items to help you with that. ‘Adjust’ works with any array function. It figures out the size needed and plops it in. ‘Expand’ gives you a peek at the handle value. It places the range returned just to the right of the cell containing the handle and selects it so you can delete it once you’ve seen it, just like ‘Adjust’.

<random>

A quote by my mathematical hero John von Neumann

Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.

Fat Tuesday is coming up and I got that out of the way: xllrandom.
It is an update using the latest xll8 library that is a faithful reflection of the latest C++ standard. It won’t win any UX prizes, but it makes it possible to play with that in Excel.
If you can’t be bothered to download and compile the code, use this.

Hooking up C++ to Excel

See xllrandom for the gory details.

Hooking up the latest C++ version of the <random> library helped me learn a few new tricks. No more variate_generator in C++, things just keep getting simpler and better.

To generate random variates using a distribution and an engine all you need to do is type dist(eng) where dist is a distribution and eng is an engine. That is what this does, but it was not trivial to leverage the latest C++ goodness to make that happen.

C++ is strongly typed and distributions and engines all have different types. Base classes that use the NVI idiom, polymorphism, and a clever substitute for std::integer_sequence I swiped from stackoverflow did the trick.

I work hard to be lazy and also had to solve the problem of distributions that take a variable number of parameters. I used the X macro trick http://www.drdobbs.com/cpp/the-x-macro/228700289 for that, incorporating Andrei Alexandrescu’s suggestion.

RIP Dr. Dobbs. It is a new world now that is difficult to keep up with. But fun. What a great time to be alive and have the opportunity to learn about all the amazing advances that are happening now.

Pop Quiz

The following multiple choice problems have the possible answers:
a) the code will not compile
b) the answer is undefined
c) the code will execute successfully
d) none of the above

Question 1. int i; assert (i == 0);
Question 2. int i; assert (i = 1);
Question 3. int& i;
Question 4. int f(int& i = 2) { return i; }
Question 5. int g(const int& i = 3} { return i; }

C++ is C++

One of my clients made fun of me recently for saying Excel is Excel. Tautologies are true statements and mathematicians have a propensity for them. What I was failing to communicate effectively was that all software tools have their limitations and you need to learn their quirks if you want to get the most out of them.

Software is written by people. Not quite a tautology these days, but close. It is still a cottage industry that requires humans with a pile of grey matter on top of a what is essentially a monkey brain to tap away at keyboards.

I learned some lessons tonight while teaching a course that uses C++ as a tool for implementing algorithms. People tend to stumble over simple things and bring preconceived notions from languages they already know. They get frustrated when C++ doesn’t work like their favorite language.

Programming languages are not religions, they are tools to help you achieve your ends. Learn your tools, don’t be one.

There is no substitute for spending time learning your tools. If you know nothing about C++, Programming – Principles and Practice Using C++ is a good place to start. If you know some C++ and want to get up to speed with the latest goodness in C++2011, consult Elements of Modern C++ Style. The C++ glossary is a handy resource also straight from the horses mouth.

; — Semicolon
By far the most common mistake I see is people forgetting semicolons. They end statements in C++. C++ is space insensitive, so a newline (or tab, or two or more consecutive spaces) provide no information to the compiler. Semicolons tell the compiler to do what you tell it to do. x = x + 1; increments x by 1. Semicolon means ‘do it’.

As a mathematician, that is like fingernails scratching on a blackboard. Why not x <- x + 1;? Anyone can propose a change to C++ at http://isocpp.org/. And they do.

{ } — Curly Braces
Curly braces are used for several different things in C++. They can be used in several different ways so pay attention!
They are used with classes and structs to group data and functions, in which case the closing curly brace must be followed with a semicolon.
They are also used with functions for the body of code. These don’t need to end with a semicolon.
Another use is for blocks of expressions associated with control flow structures: if (e) { ...; } else { ...; } If the block contains only one statement the braces are optional, but go ahead and use them anyway. It will help prevent bugs if you or someone else decides to add another statement to the block later.
Namespaces are enclosed in curly braces too. No need for a semicolon afterwards.
Finally, they are also used for uniform initialization — a very handing thing to make yourself familiar with.

See also the previous blog post.

( ) — Parentheses
These are used when declaring or defining a function and when calling a function. The difference is that when declaring or defining a function you have to tell the compiler the types of the function arguments, whereas when you call the function you only need to pass the values. Also, don’t forget you can declare (functions and data) as many times as you want, but only define them once. See the one definition rule

[ ] — Square Brackets
Of course you know these are for indexing into arrays. Maybe you didn’t know the stupid C trick that 2["unix"] evaluates to the character 'i'. (The reason is that a[b] is converted to *(a + b) by the compiler.) They are also used to introduce lambda expressions since C++2011. When you type [](int i) { return 2*i; } the compiler generates a class with an overloaded operator(). E.g.,


class lambda_funny_name_you_cannot_know {
public:
    int operator()(int i) { return 2*i; }
};

But what if you need to use some variables that are in scope at the time the function is created? Just list those inside the square brackets. By default, the values are copied into corresponding members of the generated class. It is possible to pass them by reference, but don’t do that. Non-local side effects make it more difficult to reason about code correctness.

*,& — pointers and references
Given a value like int i{42}; // <-- don't forget the semicolon! the ampersand operator, & gives you the address of the value i. This address is called a pointer. You can capture the pointer with int* pi = &i;. Given a pointer you can get the value back by dereferencing the pointer: assert (*pi == 42);.
A reference is just an alias: int& ri{i};. If you execute, e.g., ri = 24;, then i now has the value 24, not 42. If the reference is const, e.g., const int& cri{i} and you try to change the value through the reference, cri = 24; you will get a compiler error.

r-values and l-values
An l-value is a value that has a name associated with it and an r-value is a value without a name associated with it, e.g., a temporary value. If int i{1},j{2}; then i + j has the value 3, but no name, hence it is an r-value. The name comes from the fact l-values typically occur on the left hand side of assignments and r-values occur on the right hand side, int k; k = i + j;

}

Right curly brace. It is the most powerful feature of C++.
I’m only half kidding, like I always am. Even now. Garbage collected languages have a big dumb machine following you around trying to figure out when you are done using the memory you allocated. When you think your code should be executing, the hulking monster you have no control over in your footsteps might decide it is time to take over the CPU.
As Bjarne Stroupstrup said, the best language for garbage collection is a language that doesn’t need it. C++ gives you control over the resources your program uses. I’ve never really understood RAII: Resource Acquisition Is Initialization. To me it is just a way of making C++ classes behave like built in types. Constructors do the dirty work of allocating memory and putting 0’s and 1’s in place, destructors undo that, copy construction duplicates the bits, and copy assignment clears out existing bits before duplicating the right hand side of operator=().
There are two new ways to help deal with C++’s propensity to create temporary objects, aka r-values, but this post is about l-value lifetimes.
An l-value is something that has a name for the memory associated with it. Maybe I should give mad props to left curly brace too. They introduce the scope that objects live in. Write the class and give it a name and the constructor brings it into existence. The right curly brace ensures the destructor gets called on the object.

It is all under your control in C++ if you just spend some time learning how it works.