Nothin’ but INet

If you have Excel 2013 you might know about the WEBSERVICE function. Give it a URL and it returns the content as a string. You can use FILTERXML on the result and a dose of XPath to extract what you are interested in if it happens to be XML.

You can’t make this shit up:

NOTE: The WEBSERVICE function is not available in Excel Online.

Ha! Excel Online is pretty limited. Here is my stupid pet trick trying to figure out how to get that to do something useful: ExcelUDF.

I put together xllinet which provides more functionality than these two functions and works in all versions of Excel. (At least back to Excel 97.) It implements some of the WinINet library. The function INET.URL calls InternetOpenUrl and reads the result into a string. It returns a handle to the string object instead of a string. Excel cuts you off at 32K characters, the handle gives you access to arbitrarily large strings. Even in old versions of Excel with 255 character limits.

You can peek at the result with STR.GET. That way you have a chance at figuring out what you got back. XML, JSON, CSV, or a mess. Try doing that on the result from http://google.com. If it is XML, you can pass that to FILTERXML. Charles Goldfarb notwithstanding, I’m happy to let XML aficionados put themselves between angle brackets and flog themselves with attributes all day long. The smart people these days seem to be returning things in JSON. Have a gander at data.gov for more JSON than you can shake a stick at.

There are a lot of parsers out there, but I rolled my own. JSON maps quite naturally to a two column recursive Excel OPER. Excel is basically 2-D, but it is possible to accommodate recursive data structures like JSON. JSON objects are key-value pairs, where the values can be a single value, an array of values, or another JSON object. Single values are either a string, number, Boolean, or null.

Both arrays and objects are OPER ranges. Use RANGE.GET on the values returned to spelunk the object. Any key that starts with an asterisk has as its value a handle to either and array or object. Keep using RANGE.GET to dig deeper.

There are two cell menu items to help you with that. ‘Adjust’ works with any array function. It figures out the size needed and plops it in. ‘Expand’ gives you a peek at the handle value. It places the range returned just to the right of the cell containing the handle and selects it so you can delete it once you’ve seen it, just like ‘Adjust’.

Posted in Uncategorized | 2 Comments

<random>

A quote by my mathematical hero John von Neumann

Anyone who attempts to generate random numbers by deterministic means is, of course, living in a state of sin.

Fat Tuesday is coming up and I got that out of the way: xllrandom.
It is an update using the latest xll8 library that is a faithful reflection of the latest C++ standard. It won’t win any UX prizes, but it makes it possible to play with that in Excel.
If you can’t be bothered to download and compile the code, use this.

Posted in Uncategorized | Leave a comment

Hooking up C++ to Excel

See xllrandom for the gory details.

Hooking up the latest C++ version of the <random> library helped me learn a few new tricks. No more variate_generator in C++, things just keep getting simpler and better.

To generate random variates using a distribution and an engine all you need to do is type dist(eng) where dist is a distribution and eng is an engine. That is what this does, but it was not trivial to leverage the latest C++ goodness to make that happen.

C++ is strongly typed and distributions and engines all have different types. Base classes that use the NVI idiom, polymorphism, and a clever substitute for std::integer_sequence I swiped from stackoverflow did the trick.

I work hard to be lazy and also had to solve the problem of distributions that take a variable number of parameters. I used the X macro trick http://www.drdobbs.com/cpp/the-x-macro/228700289 for that, incorporating Andrei Alexandrescu’s suggestion.

RIP Dr. Dobbs. It is a new world now that is difficult to keep up with. But fun. What a great time to be alive and have the opportunity to learn about all the amazing advances that are happening now.

Posted in Uncategorized | Leave a comment

Pop Quiz

The following multiple choice problems have the possible answers:
a) the code will not compile
b) the answer is undefined
c) the code will execute successfully
d) none of the above

Question 1. int i; assert (i == 0);
Question 2. int i; assert (i = 1);
Question 3. int& i;
Question 4. int f(int& i = 2) { return i; }
Question 5. int g(const int& i = 3} { return i; }

Posted in Uncategorized | Leave a comment

C++ is C++

One of my clients made fun of me recently for saying Excel is Excel. Tautologies are true statements and mathematicians have a propensity for them. What I was failing to communicate effectively was that all software tools have their limitations and you need to learn their quirks if you want to get the most out of them.

Software is written by people. Not quite a tautology these days, but close. It is still a cottage industry that requires humans with a pile of grey matter on top of a what is essentially a monkey brain to tap away at keyboards.

I learned some lessons tonight while teaching a course that uses C++ as a tool for implementing algorithms. People tend to stumble over simple things and bring preconceived notions from languages they already know. They get frustrated when C++ doesn’t work like their favorite language.

Programming languages are not religions, they are tools to help you achieve your ends. Learn your tools, don’t be one.

There is no substitute for spending time learning your tools. If you know nothing about C++, Programming – Principles and Practice Using C++ is a good place to start. If you know some C++ and want to get up to speed with the latest goodness in C++2011, consult Elements of Modern C++ Style. The C++ glossary is a handy resource also straight from the horses mouth.

; — Semicolon
By far the most common mistake I see is people forgetting semicolons. They end statements in C++. C++ is space insensitive, so a newline (or tab, or two or more consecutive spaces) provide no information to the compiler. Semicolons tell the compiler to do what you tell it to do. x = x + 1; increments x by 1. Semicolon means ‘do it’.

As a mathematician, that is like fingernails scratching on a blackboard. Why not x <- x + 1;? Anyone can propose a change to C++ at http://isocpp.org/. And they do.

{ } — Curly Braces
Curly braces are used for several different things in C++. They can be a bit clumsy and error prone since they are used in several different ways so pay attention!
They are used for classes and structs to group data and functions, in which case the closing curly brace must be followed with a semicolon.
They are also used with functions to define the body of code that they execute. These don’t need to end with a semicolon.
Another use is for blocks of expressions associated with control flow structures: if (e) { ...; } else { ...; } If the block contains only one statement the braces are optional, but go ahead and use them anyway. It will help prevent bugs if you or someone else decides to add another statement to the block later.
Namespaces are enclosed in curly braces too. No need for a semicolon afterwards.
Finally, they are also used for uniform initialization — a very handing thing to make yourself familiar with.

See also the previous blog post.

( ) — Parentheses
These are used when declaring or defining a function and when calling a function. The difference is that when declaring or defining a function you have to tell the compiler the types of the function arguments, whereas when you call the function you only need to pass the values. Also, don’t forget you can declare (functions and data) as many times as you want, but only define them once. The one definition rule

[ ] — Square Brackets
Of course you know these are for indexing into arrays. Maybe you didn’t know the stupid C trick that 2["unix"] evaluates to the character 'i'. (The reason is that a[b] is converted to *(a + b) by the compiler.) They are also used to introduce lambda expressions in C++2011. When you type [](int i) { return 2*i; } the compiler generates a class with an overloaded operator(). E.g.,


class lambda_funny_name_you_cannot_know {
public:
    int operator()(int i) { return 2*i; }
};

But what if you need to use some variables that are in scope at the time the function is created? Just list those inside the square brackets. By default, the values are copied into corresponding members of the generated class. It is possible to pass them by reference, but don’t do that. Non-local side effects make it more difficult to reason about code correctness.

*,& — pointers and references
Given a value like int i{42}; // <-- don't forget the semicolon! the ampersand operator, & gives you the address of the value i. This address is called a pointer. You can capture the pointer with int* pi = &i;. Given a pointer you can get the value back by dereferencing the pointer: assert (*pi == 42);.
A reference is just an alias: int& ri{i};. If you execute, e.g., ri = 24;, then i now has the value 24, not 42. If the reference is const, e.g., const int& cri{i} and you try to change the value through the reference, cri = 24; you will get a compiler error.

r-values and l-values
An l-value is a value with a name associated with it and an r-value is a value without a name associated with it, e.g., a temporary value. If int i{1},j{2}; then i + j has the value 3, but no name, hence it is an r-value. The name comes from the fact l-values typically occur on the left hand side of assignments and r-values occur on the right hand side, int k; k = i + j;

Posted in Uncategorized | 2 Comments

}

Right curly brace. It is the most powerful feature of C++.
I’m only half kidding, like I always am. Even now. Garbage collected languages have a big dumb machine following you around trying to figure out when you are done using the memory you allocated. When you think your code should be executing, the hulking monster you have no control over in your footsteps might decide it is time to take over the CPU.
As Bjarne Stroupstrup said, the best language for garbage collection is a language that doesn’t need it. C++ gives you control over the resources your program uses. I’ve never really understood RAII: Resource Acquisition Is Initialization. To me it is just a way of making C++ classes behave like built in types. Constructors do the dirty work of allocating memory and putting 0’s and 1’s in place, destructors undo that, copy construction duplicates the bits, and copy assignment clears out existing bits before duplicating the right hand side of operator=().
There are two new ways to help deal with C++’s propensity to create temporary objects, aka r-values, but this post is about l-value lifetimes.
An l-value is something that has a name for the memory associated with it. Maybe I should give mad props to left curly brace too. They introduce the scope that objects live in. Write the class and give it a name and the constructor brings it into existence. The right curly brace ensures the destructor gets called on the object.

It is all under your control in C++ if you just spend some time learning how it works.

Posted in Uncategorized | 2 Comments

I keep forgetting…

…that not everyone knows what I know. Knowledge is hard won, but somehow after you know something it seems completely obvious. Even though it is not.

Some guy using my xll library kept pestering me to debug his code. It was clear to me he needed to invest more time learning C++. One dirty secret about open source software is that most people only complain about the problems they encounter. I am fine with that as long as they are pointing out shortcomings in my code. Not so much when they think I’m a free resource to fix their crappy code that has nothing to do with my library.

I’ve been lucky to have a couple of people that actually provided code I could use, but that seems to be as rare as rocking horse shit. I learned a lesson today from the guy that was calling me names for telling him he might not want to use my library.

C++ is hard. Really hard. But some fascinating things are going on in that world now. Some of the smartest people in the world have been making it easier to use. Google Bjarne Stroustrup and just listen to what he has to say. The world needs more people like him.

I’m teaching a course that uses C++ and it is still the case that people have trouble with just compiling and running a program. It is the price you pay for using a language that was designed to make it possible to get the maximum performance out of the latest silicon. It’s bits all the way down.

Some people say C is high level assembly language. They are right. That is what Dennis Ritchie and Brian Kernighan wrote it for. Hardware was changing in bigger directions in their day than now. It was a total PITA to write a routine to, e.g., sort an array of numbers and have to keep rewriting it in the latest set of low level assembly instructions. Say a prayer tonight for the compiler writers who insulate you from having to understand such low level details.

The first stage is turning the characters you type as a C program into assembly code. The compiler that some very clever people wrote for you turns that into an object file that the machine you are working on can use. Don’t confuse that with object oriented programming which came much later. It is just a file of bits that the hardware can ultimately execute.

These object files can be collected into a library and be used by other programs. Static libraries (.lib or .a) copy the bits into the program you write. Shared libraries (.dll or .so) share those bits with all programs executing on your computer.

The second stage is to link together object files (I’m lying a little, there are more stages) and provide a well known entry point to where the operating system can start doing exactly what you tell the computer to do.
To make something happen on your computer you have to execute a file. Sounds harsh, but these are names people called things before the age of political correctness.

The xll library it is a little bit different. Excel is already running. When you open an xll, which is just a dll, it looks for a function called xlAutoOpen and executes that. Set a breakpoint on it and step through all the tricky Excel SDK stuff you don’t have to worry about in order to call C/C++/Fortran.

I’ve updated https://xll8.codplex.com to work with 64-bit Excel if you are following along with the latest. Go to Build/Configuration Manager… and set the platform to x64. Let me know if that is not working for you now, as long as you understand this is not an invitation to debug your code.

Posted in Uncategorized | Leave a comment