Gather ’round kids for tales of yore. It used to be possible to write software that simply did its job and that was it. There was no need to keep changing it.
It was a simpler, but still complicated, time back in the day when Dennis, Brian, Rob and less-sung heroes were ardently trying to convince AT&T management that Unix could be a streamlined version of Multics. They got shot down (“Are you guys nuts? That was huge a failure.”) and had to cook up a scheme their management could swallow.
I wonder if they were influenced by SGML and Charles Goldfarb from IBM. Their pitch was to write a system that marked up documents using semantic content that made them amenable to computer processing. They used
nroff as cover to write Unix.
One of their most brilliant accomplishments was the notion of a hierarchical filesystem where files were just a stream of characters, and the hardware was fast enough at the time to make this practical. There were people then who didn’t trust giving up their data to this abstraction. How do you know where your data is if you didn’t spin up a disk and seek to the sector you wrote it to?
How often do standard Unix commands need to be rewritten? They are simple, orthogonal programs acting on a stream of bytes.
Eliding a lot of history so I can get to my point, and mad props to Linus for his brilliant work, but git was something he wrote for himself to deal with the influx of Linux kernel patches he had to deal with. He didn’t want to pay money for proprietary software that did a better job.
People not as smart as Linus started to put, as they say, porcelain around his plumbing infrastructure.
And there seems to be no end is sight. Tonight I had to deal with the latest “improvements” on github. Hence my post instead of working on my open source software. I have to spend time figuring out their latest tweaks to do that.
“The new native
Extend your GitHub workflow beyond your browser with GitHub Desktop, completely redesigned with Electron. Get a unified cross-platform experience that’s completely open source and ready to customize.”
No, thanks. Electron? What happened to Atom or even VS Code?
Could you kids please get on my lawn. I’ll even buy you a lemonade. Just stop pretending change is better.
Some years ago I attempted to market a product that did Monte Carlo simulation in Excel. One of my best customers at the time asked me to resurrect an old version recently. Give tukhi.com/tukhi_free0_0_0_0.zip a spin if you are curious.
Getting ready for teaching this fall and noticed the library is failing to compile for 64-bit builds with the latest version of Visual Studio 2017:
Severity Code Description Project File Line Suppression State Warning C26451 Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). xll12 c:\users\kalx\source\repos\xll12\xll\args.h 192 Active
I was baffled. Every integer is an INT32 but the compiler was still complaining. The offending line was
args[ARG::ArgumentHelp + i - 1] = argumentHelp;
When I changed this to…
auto n = ARG::ArgumentHelp + i - 1; args[n] = argumentHelp;
…the compiler warning disappeared.
I auto know better, but I have no idea why.
Next step: put some tooling around TypeScript to automate everything.
Graph Query Language.
It’s a thing now. Leave it to Facebook to come up with a completely misleading name.
It is true a tree is a graph. Querying is only one of the things it helps you with. It is not a language.
It is a specification that has been around since 2012 that is a huge improvement over REST that let’s you not only query, but provides a simple view for users doing CRUD on legacy systems in an efficient way.
If it has a dirty secret, it is that programmers have to figure out how to turn the specification into something that does more than just an http request/response. The challenge programmers now have is to figure out is how to marshal multiple resources from one call into a response that conforms the query.
This makes it easy for clients. When they ask for something, they get back a response in the same format as they ask for. It allows them to concentrate on the model. The view/presenter/controller becomes trivial for the people actually using the data.
Back to the topic. What is a spreadsheet? I have been reading “Elements of Programming” by Alexander Stepanov and was surprised to learn the latest fashion in the computer science world seems to be something I learned as a grad student in the second year of getting my MA at the University of Hawaii in math. In the course Logic and Set Theory I
learned a mathematical concept is defined by what rules it satisfies. (C++ Concepts [Lite] is a watered down version of this.) A vector space is any set with a commutative addition and scalar multiplication satisfying the distributive law. Today’s subway reading was http://gauss.cs.ucsb.edu/~aydin/GraphBLAS_API_C.pdf. At this point anyone without a PhD in math might want to stop reading, but I’ll jot down a few thoughts. They can be made mathematically rigorous using the language of Cartesian Closed Categories.
A spreadsheet is a function from a set of indices I → C to cells.
An index is just a set and a cell has a value and perhaps a formula.
A value can be a number, or string, or Boolean, or a reference to one or more cells, or an error. (Or a couple of other things if you’ve been following along with how I stay faithful to the original Microsoft C API.)
A function is a function from zero or more cells to a value.
That only defines the type of the spreadsheet. We also have to define what functions can be applied to spreadsheet types.
Since you know how Excel works, these will seem obvious to you.
Enter: S × I × (V + F) → S
lets you select a cell in your spreadsheet and enter either a value or formula.
Delete: S × I → S removes
the value and formula of the corresponding index.
By now you are getting the hang of things. Move, Copy, Precedents, etc. are just functions.
The tricky thing is evaluation of spreadsheets because there are different ways of doing this. The first step is defining a clean versus a dirty cell. I think you can do that with the simple notation above. But I might be wrong.
Apologies to Raymond Chen: https://blogs.msdn.microsoft.com/oldnewthing/
Against my better judgement I rewrote the xll add-in library. Again.
Still learning all the new C++11/14/17 good bits. And trying to avoid the crazy things that seem to be slipping into the latest standards.
No disrespect to women, but I always feel like I’m giving birth every time I try this. The ancient C Excel SDK has its, um, personality. I continue to be floored by how expressive modern C++ has become.
The AddIn class now uses lambda expressions to Register and Unregister on xlAutoOpen and xlAutoClose.
You could probably save a few characters if you wrote this in python/ruby/perl/language du jour, except you can’t write this in those languages. It definitely takes more brain cells to come up with code this tight, but that is where the fun is.
Hitler uses git: https://www.youtube.com/watch?v=CDeG4S-mJts
Why is git called git? Just ask Linus. https://www.youtube.com/watch?v=4XpnKHJAok8. It’s funny how people tell you things if you just listen.
Andrew Tridgell reverse engineered BitKeeper and the company that invested capital in their product decide not to provide a free version to people like Linus.
Linus got unhappy about that so he wrote something he needed to replace that. Software is a funny, fungible thing. It is not like normal business things. Some people can be 10x more productive, not just 10%, maybe even more, but it is difficult to measure.
Andrew Tridgell also reverse engineered the Samba protocol and was the co-inventor of rsync. The only thing I asked for when my team took over the Excel add-ins at Banc of America Securities was a Samba server for the files. The Microsoft file system at that time made it impossible to delete files that were open.
I use rsync every day, and twice on Sundays. It is one of these brilliant things that were just done right. No need to keep changing. Confession time: I use vi. I would not wish that on my worst enemy, but Bill Joy wrote the most efficient editor for turning keystrokes into programs.
Linus wannabes have been spending a lot of effort to make git usable for mere mortals. GitHub is in the news lately. My experience has been annoyance at the sloppy kids hopped up on Red Bull fiddling with one tiny aspect they finally groked and foisting that on users who just want a clear picture on how to use their product. It sounds like the adults have finally showed up.
There is actually a workflow that can be taught to people who are not as smart as Linus. It is on the GitHub website.
It is the complete opposite of how Linus envisioned it would be used. But he built it for himself.
Against my better judgement, I’ve been rewriting the Excel add-in library. After a long day dealing with C++03 limitations at my current gig, I have to confess I am really enjoying applying the latest C++14 goodness to my old code.
Don’t try this at home, kids: https://github.com/keithalewis/xll12
You will need Visual Studio 2015 and be willing to work on your modern C++ chops. Let me know what problems you run into and I’ll make my best effort to get you going. Don’t underestimate your ability to help improve this code. Clone it and send me a pull request.
That reminds me. Time for a git rant.
Working on huge data at Bloomberg these days. After a long day of trying to figure out how they do that I like to think about math.
My attempt at working up to “A monad is just a monoid in the category of endofunctors.” It gets messy when using `SelectMany` in LINQ. Google Bart De Smet. Or maybe you can now just ask Cortana. Technology moves fast these days.
There is no Royal Road. It is still fun to use the grey matter sitting on top of a primate brain in the meantime.