一次搞定C++右值,&&和move

913 阅读6分钟

One-Shot Learning of C++ r-value, &&, and Move

Photo by Sean McAuliffe on Unsplash
C++ is hard, the newer versions become even harder. This article will deal with some of the hard parts in C++, r-value, r-value reference (&&) and move semantics. And I am going to reverse engineer (not a metaphor) these complex and correlated topics, so you can understand them completely in one shot.

Firstly, let’s examine

What is a r-value?

A r-value is one that should be on the right side of an equals sign.

Example:

int var; // too much JavaScript recently:)
var = 8; // OK! l-value (yes, there is a l-value) on the left

8 = var; // ERROR! r-value on the left
(var + 1) = 8; // ERROR! r-value on the left

Simple enough. Then let’s look at some more subtle r-values, ones that are returned by functions:

#include <string>
#include <stdio.h>

int g_var = 8;
int& returnALvalue() {
   return g_var; //here we return a left value
}

int returnARvalue() {
   return g_var; //here we return a r-value
}

int main() {
   printf("%d", returnALvalue()++); // g_var += 1;
   printf("%d", returnARvalue());
}

Result:

8
9

It is worth noting that the way of returning a l-value (in the example) is considered a bad practice. So do not do that in real world programming.

Beyond theoretical level

Whether a variable is a r-value can make differences in real programming even before && is invented.

For example, this line

const std::string& name = "rvalue";

can be compiled fine while this:

std::string& name = "rvalue"; // use a left reference for a rvalue

generates following error:

error: non-const lvalue reference to type 'std::string' (aka 'basic_string<char, char_traits<char>, allocator<char> >') cannot bind to a value of unrelated type 'const char [7]'

The error message means that the compiler enforces a const reference for r-value.

A more interesting example:

#include <stdio.h>
#include <string>

void print(const std::string& name) {
    printf("rvalue detected:%s\n", name.c_str());
}
void print(std::string& name) {
    printf("lvalue detected:%s\n", name.c_str());
}
int main() {
    std::string name = "lvalue";
    print(name); //compiler can detect the right function for lvalue

    print("rvalue"); // likewise for rvalue
}

Result:

lvalue detected:lvalue
rvalue detected:rvalue

The difference is actually significant enough and compiler can determine overloaded functions.

So r-value is constant value?

Not exactly. And this where && (r-value reference)comes in.

Example:

#include <stdio.h>
#include <string>
void print(const std::string& name) {
  printf(“const value detected:%s\n”, name.c_str());
}
void print(std::string& name) {
  printf(“lvalue detected%s\n”, name.c_str());
}
void print(std::string&& name) {
  printf(“rvalue detected:%s\n”, name.c_str());
}
int main() {
  std::string name = “lvalue”;
  const std::string cname = “cvalue”;
  print(name);
  print(cname);
  print(“rvalue”);
}

Result:

lvalue detected:lvalue
const value detected:cvalue
rvalue detected:rvalue

If the functions are overloaded for r-value, a r-value variable choose the more specified version over the version takes a const reference parameter that is compatible for both. Thus, && can further diversify r-value from const value.

In bellow I summarize the compatibility of overloaded function versions to different types the in default setting. You can iterate different permutation by commenting out certain lines in the example above to verify the result.

It sounds cool to further differentiate r-value and constant value, which are not exactly the same. But what is the practical value?

What problem does && solve exactly?

The problem is the unnecessary deep copy when the argument is a r-value.

To be more specific. && notation is provided to specify a r-value, which can be used to avoid the deep copy when the r-value, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

It can be more specific with examples:

#include <stdio.h>
#include <string>
#include <algorithm>

using namespace std;

class ResourceOwner {
public:
  ResourceOwner(const char res[]) {
    theResource = new string(res);
  }
  ResourceOwner(const ResourceOwner& other) {
    printf("copy %s\n", other.theResource->c_str());
    theResource = new string(other.theResource->c_str());
  }
  ResourceOwner& operator=(const ResourceOwner& other) {
    ResourceOwner tmp(other);
    swap(theResource, tmp.theResource);
    printf("assign %s\n", other.theResource->c_str());
  }
  ~ResourceOwner() {
    if (theResource) {
      printf("destructor %s\n", theResource->c_str());
      delete theResource;
    }
  }

private:
  string* theResource;
};
void testCopy() { // case 1
  printf("=====start testCopy()=====\n");
  ResourceOwner res1("res1");
  ResourceOwner res2 = res1;  //copy res1
  printf("=====destructors for stack vars, ignore=====\n");
}
void testAssign() { // case 2
  printf("=====start testAssign()=====\n");
  ResourceOwner res1("res1");
  ResourceOwner res2("res2");
  res2 = res1; //copy res1, assign res1, destrctor res2
  printf("=====destructors for stack vars, ignore=====\n");
}
void testRValue() { // case 3
  printf("=====start testRValue()=====\n");
  
  ResourceOwner res2("res2");
  res2 = ResourceOwner("res1"); //copy res1, assign res1, destructor res2, destructor res1
  printf("=====destructors for stack vars, ignore=====\n");
}
int main() {
  testCopy();
  testAssign();
  testRValue();
}

Result:

=====start testCopy()=====
copy res1
=====destructors for stack vars, ignore=====
destructor res1
destructor res1
=====start testAssign()=====
copy res1
assign res1
destructor res2
=====destructors for stack vars, ignore=====
destructor res1
destructor res1
=====start testRValue()=====
copy res1
assign res1
destructor res2
destructor res1
=====destructors for stack vars, ignore=====
destructor res1

The result are all good for the first two test cases, i.e., testCopy() and testAssign(), in which resource in res1 is copied for the res2. It is reasonable to copy the resource because they are two entities both need their unshared resource (a string).

However, in the third case, the (deep) copying of the resource in res1 is superfluous because the anonymous r-value (returned by ResourceOwner(“res1”)) will be destructed right after the assignment thus it does not need the resource anymore:

res2 = ResourceOwner("res1"); // Please note that the destructor res1 is called right after this line before the point where stack variables are destructed.

I think it is a good chance to repeat the problem statement:

&& notation is provided to specify a r-value, which can be used to avoid the deep copy when the r-value, 1) is passed as an argument of either a constructor or an assignment operator, and 2) the class of which contains a pointer (or pointers) referring to dynamically allocated resource (memory).

If copying of a resource that is about to disappear is not optimal, what is the right operation then? The answer is

Move

The idea is pretty straightforward, if the argument is a r-value, we do not need to copy. Rather, we can simply “move” the resource (that is the memory the r-value points to). Now let’s overload the assignment operator using the new technique:

ResourceOwner& operator=(ResourceOwner&& other) {
  theResource = other.theResource;
  other.theResource = NULL;
}

This new assignment operator is called a move assignment operator. And a move constructor can be programmed in a similar way.

A good way of understanding this is: when you sell your old property and move to a new house, you do not have to toss all the furniture as we did in case 3 right? Rather, you can simply move the furniture to the new home.

All good.

What is std::move?

Besides the move assignment operator and move constructor discussed above, there is one last missing piece in this puzzle, std::move.

Again, we look at the problem first:

when 1) we know a variable is in fact a r-value, while 2) the compiler does not. The right version of the overloaded functions can not be called.

A common case is when we add another layer of resource owner, ResourceHolder and the relation of the three entities is given as bellow:

holder
 |
 |----->owner
         |
         |----->resource

(N.b., in the following example, I complete the implementation of ResourceOwner’s move constructor as well)

Example:

#include <string>
#include <algorithm>
using namespace std;
class ResourceOwner {
public:
  ResourceOwner(const char res[]) {
    theResource = new string(res);
  }
  ResourceOwner(const ResourceOwner& other) {
    printf(“copy %s\n”, other.theResource->c_str());
    theResource = new string(other.theResource->c_str());
  }
++ResourceOwner(ResourceOwner&& other) {
++ printf(“move cons %s\n”, other.theResource->c_str());
++ theResource = other.theResource;
++ other.theResource = NULL;
++}
  ResourceOwner& operator=(const ResourceOwner& other) {
    ResourceOwner tmp(other);
    swap(theResource, tmp.theResource);
    printf(“assign %s\n”, other.theResource->c_str());
  }
++ResourceOwner& operator=(ResourceOwner&& other) {
++ printf(“move assign %s\n”, other.theResource->c_str());
++ theResource = other.theResource;
++ other.theResource = NULL;
++}
  ~ResourceOwner() {
    if (theResource) {
      printf(“destructor %s\n”, theResource->c_str());
      delete theResource;
    }
  }
private:
  string* theResource;
};
class ResourceHolder {
……
ResourceHolder& operator=(ResourceHolder&& other) {
  printf(“move assign %s\n”, other.theResource->c_str());
  resOwner = other.resOwner;
}
……
private:
  ResourceOwner resOwner;
}

In ResourceHolder’s move assignment operator, we want to call ResourceOwner’s move assignment operator since “a no-pointer member of a r-value should be a r-value too”. However, when we simply code resOwner = other.resOwner, what get invoked is actually the ResourceOwner’s normal assignment operator that, again, incurs the extra copy.

It’s a good chance to repeat the problem statement again:

when 1) we know a variable is in fact a r-value, while 2) the compiler does not. The right version of the overloaded functions can not be called.

As a solution we use to std::move to cast the variable to r-value, so the right version of ResourceOwner’s assignment operator can be called.

ResourceHolder& operator=(ResourceHolder&& other) {
  printf(“move assign %s\n”, other.theResource->c_str());
  resOwner = std::move(other.resOwner);
}

What is std::move exactly?

We know that type cast is not simply a compiler placebo telling a compiler that “I know what I am doing”. It effectively generate instructions of mov a value to bigger or smaller registers (e.g.,%eax->%cl) to conduct the “cast”.

So what std::move does exactly behind scene. I do not know myself when I am writing this paragraph, so let’s find out together.

First we modify the main a bit (I tried to make the style consistent)

Example:

int main() {
  ResourceOwner res(“res1”);
  asm(“nop”); // remeber me
  ResourceOwner && rvalue = std::move(res);
  asm(“nop”); // remeber me
}

Compile it, and dissemble the obj using

clang++ -g -c -std=c++11 -stdlib=libc++ -Weverything move.cc
gobjdump -d -D move.o

Result:

0000000000000000 <_main>:
 0: 55 push %rbp
 1: 48 89 e5 mov %rsp,%rbp
 4: 48 83 ec 20 sub $0x20,%rsp
 8: 48 8d 7d f0 lea -0x10(%rbp),%rdi
 c: 48 8d 35 41 03 00 00 lea 0x341(%rip),%rsi # 354 <GCC_except_table5+0x18>
 13: e8 00 00 00 00 callq 18 <_main+0x18>
 18: 90 nop // remember me
 19: 48 8d 75 f0 lea -0x10(%rbp),%rsi
 1d: 48 89 75 f8 mov %rsi,-0x8(%rbp)
 21: 48 8b 75 f8 mov -0x8(%rbp),%rsi
 25: 48 89 75 e8 mov %rsi,-0x18(%rbp)
 29: 90 nop // remember me
 2a: 48 8d 7d f0 lea -0x10(%rbp),%rdi
 2e: e8 00 00 00 00 callq 33 <_main+0x33>
 33: 31 c0 xor %eax,%eax
 35: 48 83 c4 20 add $0x20,%rsp
 39: 5d pop %rbp
 3a: c3 retq
 3b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)

between the two nop, we can notice some dummy instructions generated for the move(if looking closely, you can know that they do basically nothing) However, if we turn on O (-O1)for the compiler, all the instructions will be gone.

clang++ -g -c -O1 -std=c++11 -stdlib=libc++ -Weverything move.cc
gobjdump -d -D move.o

Moreover, if changing the critical line to:

ResourceOwner & rvalue = res;

The assembly generated is identical.

That means the move semantics is pure syntax candy and a machine does not care at all.

To conclude,