Yet Another Coder

An inquiry into CPL values

Christopher Strachey and CPL are sometimes mentioned in the discussions of C++ value categories, though usually just briefly and for dubious reasons: C++ value categories are defined in a specific fashion different from Strachey’s blueprints, and some even say that the analogy is downright confusing. I’ve attempted to get a better understanding of the issue, and this post summarizes my findings.

“The terms “lvalue” and “rvalue” are deep in C++’s genes. They were introduced by Christopher Strachey for CPL [Strachey,196?], the ancestor to BCPL. Dennis Ritchie used “lvalue” to describe C (e.g. see [K&R,1978]), but left out “rvalue”, considering “lvalue” and “not lvalue” sufficient. I did the same for early definitions of C++ (e.g. see [Stroustrup,1986] and [Ellis,1989]). The terms “lvalue” and “rvalue” are “all over” the draft C++ standard. Clearly, this is not terminology to mess with without serious reason and great care.” — B. Stroustroup: “New” Value Terminology

“The terminology of “lvalues” and “rvalues” is confusing because their history is confusing.  (By the way, they’re just pronounced as “L values” and “R values”, although they’re written as single words.)  These concepts originally came from C, and then were elaborated upon by C++.  To save time, I’ll skip over their history, including why they’re called “lvalues” and “rvalues”, and I’ll go directly to how they work in C++98/03.  (Okay, it’s not a big secret: “L” stands for “left” and “R” stands for “right”.  But the concepts have evolved since the names were chosen, and the names aren’t very accurate anymore.  Instead of going through the whole history lesson, you can consider the names to be arbitrary like “up quark” and “down quark”, and you won’t lose anything.)

C++03 3.10/1 says: “Every expression is either an lvalue or an rvalue.”  It’s important to remember that lvalueness versus rvalueness is a property of expressions, not of objects.

Lvalues name objects that persist beyond a single expression.  For example, obj , *ptr ,ptr[index] , and ++x are all lvalues.

Rvalues are temporaries that evaporate at the end of the full-expression in which they live (“at the semicolon”).  For example, 1729 , x + y , std::string(“meow”) , and x++ are all rvalues.” — S.T. Lavavej “Rvalue References: C++0x Features in VC10, Part 2″

 

But why even bother reading about CPL and its concepts?

Well, at least for 2 reasons:

* first, naturalistic: CPL has never been completely finished, but a compiler for a small subset of it (BCPL) focused on system programming was eventually implemented, then after a couple more mutations the language turned into C and then C++, so there is a direct relation between CPL and C++ in the evolutionary sense, and it’s interesting to trace the path of the inherited characteristics and, (maybe not so) surprisingly, discover that many modern C++ features are merely the newly “re-discovered” items from the Strachey’s ideal language “wish list” from the 60’s;

* second, pragmatic: Strachey’s concepts of R-values and L-values, in my opinion, are simpler, more fundamental and more universal than “rvalues” and “lvalues” of C++, and they uncover some important semantics underlying expression evaluation in most programming languages (C++ included). From here on I’ll use terms R-value and L-value to refer to the concepts introduced in [Strachey, 1967], and terms rvalue, lvalue, prvalue, glvalue, xvalue as described in [Stroustrup, 2010].

If you’ve ever struggled to understand the precise semantics of C++ value categories you’ll be surprised by the simplicity of Strachey’s definitions. His L-values and R-values can be explained with a single example. Imagine you have an array A and you make the most trivial assignment “A[i] = A[i]” — the left-hand side of the assignment is “literally” the same as the right-hand side of the assignment, nevertheless “A[i]” means 2 different things in these 2 occurrences: “A[i]” on the left stands for the “place” to store the value, and “A[i]” on the right stands for the “value” itself. To formalize that distinction, Strachey said that on the left-hand side of the assignment we evaluate the L-value of “A[i]” and on the right-hand side we evaluate the R-value of “A[i]”. L-value of an expression can be thought of as an address, or more universally as a “location” of the value in the “idealized storage” of the computer, and R-value is the value itself stored in that location in the storage.

This simple duality uncovers several fundamental design decisions in the language at hand:

1. The language has mutable values — in a pure functional language we wouldn’t need 2 kinds of values, as there is no “mutating” assignment and there would be really no concept of a “location” of a value, nor any way to “put” a given value into a particular “location”.

2. We choose to not explicitly differentiate the evaluation of L-values and R-values in the syntax of the assignment (we write “A[i]” on both sides and we have a mere convention about which side stands for the destination place, and which side stands for the value). Strachey hints that it “just so happened” in his contemporary languages, and seems to work for practical purposes, but in the more formal parts of his notes he switches to another notation, where L and R evaluation mode is explicit.

3. The names of the variables in our languages (and indexed array elements) refer both to the value and to the “location” of the value in the machine’s “store”, i.e. they can be evaluated both in L-mode and in R-mode.

But “value assignment” operator is not the only place where the distinction between L-values and R-values matters:

* we may want to assign an L-value (i.e. “bind” another name to the “location” of an existing name, evaluating L-value of an expression on the “right-hand side” of the “assignment”) — Strachey introduces special assignment syntax for this in CPL (because the value assignment evaluates the R-value of the right-hand side by definition);

* for function arguments and for context variables captured in lambdas, we want to specify whether we take them by R-value or by L-value — for the first option Strachey uses the term “pass by value”, for the second — “pass by reference”.

We can find Strachey’s L-values and R-values in C++ too, conceptually, but lvalues and rvalues defined in the Standard have different semantics, and this is a source of confusion and misunderstanding. I think it is safe to say that C++ value terminology is substantially different from what Strachey has proposed in his lectures. In C++ non-const references sort-of behave as Strachey’s references: e.g. they bind to parameter L-values in function calls. But there are const references too, which can bind to rvalues. However, semantically, even when you pass by a const reference in C++ you’re still binding to L-values: we use const references to avoid copying the value, conceptually we pass its location instead, with a promise to not change the value stored there. Temporary objects, which belong to the category of rvalues, arguably live somewhere and have a location, though at some point in C++ history language designers decided that you shouldn’t be allowed to bind a mutable reference to their location, because it is error-prone in most normal use cases (btw, you can still see this legacy C++ behavior supported in MSVC even in 2015 as a “non-standard extension” that every sane developer must turn into a warning by using warnings level 4). The irony is that for move semantics in C++11 to work you need to obtain a mutable L-value of the temporary object — the value, which language designers tried to hide from developers before to not let them accidentally change a temporary object and make a “logical mistake”. This mutable L-value of the temporary object you can now get with what they called an rvalue reference. Note that rvalue reference can bind to some generalized lvalues (called xvalues). Note that you can also still bind to the const L-value of rvalues using a const lvalue reference. Note also that a named rvalue reference is an lvalue itself (and has an L-value). Pfffff. Go figure!

“We didn’t find any real constraints on the naming to guide us, so we picked ‘x’ for the center, the unknown, the strange, the xpert only, or even x-rated.” — B. Stroustroup: “New” Value Terminology

 

To conclude, let’s look at several code examples from Strachey’s notes illustrating operations on L-values and R-values envisioned for CPL, and compare them to the C++11 syntax achieving the same effect.

Introducing a new name, with a new L-value, and initializing it with a given R-value.

let x = 3
auto x = 3;

Mutating assignment, putting a given R-value into the given L-value.

x := A[i]
x = A[i];

L-value assignment, “binding” a new name to an existing L-value.

let z ≃ A[i]
auto& z = A[i];

Capturing an R-value of a free variable (capturing “by value”).

let a = 3
let f[x] = x + a
auto a = 3;
auto f = [=](int x){ return x + a; };

Capturing an L-value of a free variable (capturing “by reference”).

let a = 3
let f[x] ≡ x + a
auto a = 3;
auto f = [&](int x){ return x + a; };

 


 

References

[Strachey et al., 1963] D.W. Barron, J.N. Buxton, D.F. Hartley, E. Nixon, and C. Strachey. “The main features of CPL” The Computer Journal 6:2:134-143 (1963)  http://www.math.bas.bg/~bantchev/place/cpl/features.pdf

[Strachey, 1967] C. Strachey “Fundamental Concepts in Programming Languages” (http://www.itu.dk/courses/BPRD/E2009/fundamental-1967.pdf)

[Lavavej, 2009] S.T. Lavavej “Rvalue References: C++0x Features in VC10, Part 2”   http://blogs.msdn.com/b/vcblog/archive/2009/02/03/rvalue-references-c-0x-features-in-vc10-part-2.aspx

[Miller, 2010] William M. Miller. A Taxonomy of Expression Value Categories. N3055. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3055.pdf

[Stroustrup, 2010] B. Stroustroup: “New” Value Terminology, http://www.stroustrup.com/terminology.pdf

What is premature optimization

Several generations of software folklore capture the negative attitude towards premature optimization. But what is “premature” optimization exactly? I share some thoughts on the subject today.

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” – Donald Knuth

“More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason — including blind stupidity.” — W.A. Wulf

“The First Rule of Program Optimization: Don’t do it. The Second Rule of Program Optimization (for experts only!): Don’t do it yet.” — Michael A. Jackson

I think one missing piece in all the premature optimization talk is that it really isn’t about code optimization per se, but about software design strategy in general. Someone claiming that code optimization is “premature” is supposedly able to show that another design strategy is “mature” in contrast. I like to think about design as a search problem: you are in some point A in a multi-dimensional design space, and you need to get to some area B, determined by the new user stories you must support and the acceptable ranges of the myriads of characteristics your solution has to fulfill. Your design documents describe the path you’re taking inside that multi-dimensional space to get from A to B (or at least to some point C that is closer to B than A was).  Changes you commit to the repository are the actual moves in the design space, and the amount of time you spend on each change is the actual “cost” of that move.

Now the tricky part is that the design space is not static: every move you make (and every bond you break) can potentially change the cost of  any other move in the space. The extent of this potential change is determined by the nature of the move and by the “interconnectedness” characteristics of your current design, such as coupling between and coherence of the modules. Naturally, every move also takes you into a different point in the space, so the distances from your position to other points in the space change. To put it simply, developing software is like trying to solve a billion-sided Rubik’s cube, where the stickers are changing their colors after each rotation… But, I digress.

What would be the difference then between a “premature” optimization and a non-premature? A premature optimization is the optimization that doesn’t bring you closer to your desired area B. It is either a move in the direction opposite from your real goal, or a move that changes the costs of other moves unfavorably, so the distance to the real goal increases. Two examples illustrate these cases:

1. Ms Piggy decides to optimize database access performance: she noticed that some extra queries are being sent by the intermediate abstraction layer, so she decides to work around this layer and directly send the database queries from several place in code. She spends two weeks to finish this work and her tests show that the scenario is now faster. She doesn’t realize that the performance of this scenario is of low importance to users, because it runs nightly as part of an automatically scheduled db maintenance job; in fact, users are more concerned about the frequent connection failures. A few weeks later, someone has to undo Ms Piggy’s change to update the intermediate layer API to improve logging and connection reliability diagnostics. Ms Piggy’s optimization was “premature”, because it was going in the direction opposite from the overall design goal and had to be undone.

2. Kermit decides to implement his module “with performance in mind”, therefore he avoids calling into existing methods from other modules, instead implementing his own, “more efficient” versions of those methods. Most of the code in the module would not have caused performance issues, even if it had used the existing methods. However, for the next few years, it increases the cost of all changes touching this module. Designers do not realize the specific implementation of the module and underestimate the costs of the design changes. Coders do not realize the specific implementation of the module and introduce subtle and costly defects. Slowly the module is updated to call into shared libraries and the amount of duplicate logic is decreased. Kermit’s optimization increased costs of the changes that followed it without justification, thus the optimization was “premature”.

 “All of this reinforces the key rule that first you need to make your program clear, well factored, and nicely modular. Only when you’ve done that should you optimize.” — Martin Fowler, Yet Another Optimization Article

“…the strategy is definitely: first make it work, then make it right, and, finally, make it fast.” — Stephen C. Johnson and Brian W. Kernighan, http://c2.com/cgi/wiki?MakeItWorkMakeItRightMakeItFast

Really, the “no premature optimization” rule is merely a heuristic for the search of the optimal path in the design space: “do not use a greedy algorithm to optimize design”. Often to get to the desired state you have to make multiple moves of the “refactoring” nature, which do not “improve” any observable program behavior at all, all they do is decrease the costs of the further moves towards the goal.

Finally, among many reasons why people make wrong design moves, I think, miscommunication or insufficient communication is a major one. This situation is related to the “inner-outer world” dilemma, and premature optimization is one of the examples of that tree falling in the forest, when no one is around to hear. To avoid wrong moves, designers and coders need to have a clear shared vision of customers’ needs.

“Users are more interested in tangible program characteristics than they are in code quality. Sometimes users are interested in raw performance, but only when it affects their work. Users tend to be more interested in program throughput than raw performance. Delivering software on time, providing a clean user interface, and avoiding downtime are often more significant.” — Steve McConnell, Code Complete, 2nd ed.

Changing the outer world

Coders’ work is all about the inner world: ideas, concepts, logic, abstractions. Mindfulness. It’s difficult not to be sucked into “the Zone” after you’ve learned programming: it feels magical how a snippet of text typed into a machine can cause it to do something cool. The inner world is neat and tidy, and you’ve practically built it from nothing, and you know how it works, and you can change it at the speed of thought. That’s why coders don’t like slow build systems, slow unit tests or slow third-party libraries — because they stand in the way of building and running The World!

“Some people look at the world through rose-colored glasses. Programmers like you and me tend to look at the world through code-colored glasses. We assume that the better we make the code, the more our clients and customers will like our software.” — Steve McConnell, Code Complete, 2nd ed.

Sometimes coders get so over-obsessed with the inner world, they forget that the outer world is still here. There is time and place for unconstrained art and pure abstract research, but I’m talking about us, mortals, working for a company, which exists for a purpose and makes a product to sell. Making a product, which someone wants to pay their money for, isn’t an easy task, and it involves a lot of efforts to understand, influence and change the outer world. And here’s the main “problem” with the outer world — it is very slow compared to the inner one. It has inertia. And it has other things to pay attention to.

So instead of learning the mechanisms of changing the outer world, coders start to ignore it. It’s too slow to work with. It “lags” too much behind the “cutting edge”. It’s not under control. That’s when coders lose touch. They start working on something that has no real value: refactoring without purpose, optimizing unused parts of the program, discussing the benefits of placing a curly bracket on a separate line. It’s like that tree that falls in the forest, when no one is around to hear.

Don’t lose touch. Think about the lifetime of your code. How is it supposed to unfold? How will your customers learn about the new feature? Who is going to demo it? Do they have all the documentation they need? Is there a clear shared understanding of the new feature, or are there several different views?

More can be done while the feature is shaping, than after it’s gone public. Fewer have to be involved and it’s easier to reach a consensus and coordinate your work. Others will help you, but make sure the forces are applied in the same direction. As author and designer, you have a great deal of influence on how the feature is documented, presented, explained, demoed. Do what’s in your power to help get the right message out and change the outer world. Next time you’re cutting down a tree, make sure there’s enough media coverage for the event and that everyone on your team agrees on which tree is supposed to fall.