C++Primer 3.3: Vector

cppPrimer cpp

创建时间:2019-11-25 16:00

字数:3k 阅读:

Library vector Type

Library `vector` Type

a vector is a collection of objects, all of which have the same type.

在C++中，一个vector就是一系列相同类型的objects。通常，vector也被称为容器(container)，因为它就像是一个容器来盛放其它objects。

vector是一个类模板（class template），因此我们在使用的时候必须要提供额外的信息来创建这个类。模板（templates）不是方法（function）或者类（class），而可以看作是指导compiler如何产生类或函数的指令（instructions），或者说规则。而compiler从模板中产生类或者方法的过程叫作实例化（instantiation）。当我们要使用一个目标来实例化类的时候，需要指明（即提供额外信息）我们想要实例化哪一种类或方法。模板是一个比较复杂的东西，我们会在之后详细说到它。

Tips:

vector是一个template，不是一个type。但是从vector模板中生成的包含element type的如vector<int>则是type。

因为reference不是object，所以vector并不能用来盛放reference。

`vector` Summary

vector作为一种可变长度的STL容器，其底层数据结构为数组。因此，vector也具有O(1)时间的快速访问效率；同时，因为其是顺序存储（连续的内存空间），所以插入(insert)和删除(erase)效率都较低（O(N)）。

相比于传统的数组(array)，vector的优势在于可以动态调整大小，其扩容规则一般以2倍扩大的方式增长。例如：

    vector<int> v;
    cout<<"initial capacity = "<< v.capacity()<<endl; 
    cout<<"initial size = " << v.size()<<endl;
    cout<<"=========="<<endl;

    for(int i=0; i<5; i++)
    {
        v.push_back(i);
        cout<<"capacity = " << v.capacity()<<endl;
        cout<<"size = "<<v.size()<<endl;
        cout<<"---------"<<endl;
    }

/*  results  */

initial capacity = 0
initial size = 0
==========
capacity = 1
size = 1
---------
capacity = 2
size = 2
---------
capacity = 4  //  2 * 2 = 4
size = 3
---------
capacity = 4
size = 4
---------
capacity = 8  //  4 * 2 = 8
size = 5
---------

注意上述size()指的是vector中elements的数量，capacity则指的是vector的容量。在新建一个vector的时候，其初始大小capacity = 0，随着第一个元素0被push进去，其大小扩充为capacity = 1；随着第二个元素被push进去，其大小扩充为capacity = 2，随着第三个元素被push进去，其大小扩充为capacity = 2*2 = 4… 另外，需要注意的是其具体的扩充方式为：

首先重新申请一个2倍大小的连续内存空间 ；
然后将原内存空间的内容copy过来；
最后将原空间内容进行释放，将内存空间交还给OS。

这也是为什么如果vector发生了扩容，那么所有的迭代器的都会失效的原因。

另外，对于一般的vector初始化:

vector<int> v;
cout<<sizeof(v)<<endl;  // 24

/*  data member of vector  */
iterator start;  // points to the start of the used memory
iterator finish;  //  points to the end of the used memory
iterator end_of_storage;  // points to the end of the whole allocated continuous memory

其sizeof得到的结果是其自身的大小。因为sizeof得到的是object本身的大小，而vector<int>本身是vector这个class template的实例化，所以sizeof(v)得到的是这个类vector<int>创建的对象v本身的大小，其与compiler有关。可以看到vector一般有3个成员变量，每个大小为8，所以在我这icc下sizeof为24。

另外，vector中的elements会被存放到堆（heap）上，也就是说不管其中有多少个elements，sizeof的结果不会改变，具体here。即使element在栈（stack）上被创建，在v.push_back()的时候会调用object的copy constructor将element拷贝到堆上进行存储，并且vector内部通过指针来访问这个object的地址，具体here。

也就是说，vector中其实保存的并不是object本身，而是在push_back()时通过object自身的copy constructor将这个object拷贝到堆上（即std::move）finish迭代器（此时可以理解为指针）所指向的内存（也即是vector的capacity大小的连续内存中的一块）。这个过程涉及到了vector的底层实现的源码，在此不展开，有兴趣可以查看here。

Conclusions:

从宏观上来说，vector内部维护了三个iterator（此时可看作指针），来控制堆上的一块连续内存。elements被存储在堆上的这块连续内存上，vector本身则存储在栈上（默认情况下）。说objecs被存放在vector中只是一种从作用和效果上考虑的说法，实际上vector中并没有存放objects。

所以，vector可以看作是通过几个迭代器（封装好就是vector）来操纵一个堆上的数组，只不过通常这几个迭代器在栈上，这个数组在堆上。

`vector` Definition and Initializition

与其他class type一样，vector有很多种初始化方法，同时C++11还提供了list initializer：

vector<int> iv;  // definition, iv is an empty vector
vector<int> iv2(iv);  // copy elements of iv into iv2
vector<int> iv2 = iv;  // equivalent to the above
vector<string> sv(iv);  // error: sv holds strings not int;

/*  parentheses initializer (count, value)  */
vector<int> iv3(10, -1);  // ten int elements, each initialized to -1
vector<int> iv4(10);  // ten elements, each initialized to 0
vector<string> sv2(10, "hi!"); // ten string elements, each initialized to "hi!"
vector<string> sv3(10); // ten string elements, each initialized to empty

/*  list initializer  */
vector<string> sv4{"a", "an", "the"};
// three elements, the first holds "a", the second holds "an", the third holds "the". Must use curly braces.
vector<string> sv4("a", "an", "the");  // error: must use {}

/*  mixed curly braces initializer {} and parentheses initializer ()  */
vector<int> iv5(10);  // ten elements with value 0
vector<int> iv6{10};  // one element with value 10
vector<int> iv7(10, 1);  // ten elements with value 1
vector<int> iv8{10, 1};  // two elements with value 10 and 1
vector<string> sv5{"hi"};  // list initialization, one element
vector<string> sv5("hi");  // error: can't construct a vector from a string literal
vector<string> sv6{10};  // ten elements with empty string
vector<string> sv7{10, "hi"}  // ten elements with value "hi"

/*  use array as initializer  */
int a[] = {0,1,2,3,4,5};
vector<int> vi(begin(a), end(a));  // vi has six elements, each is a copy of the corresponding element in array a
vector<int> subVi(a+1, a+4);  // subVi has three elements, from a[1] to a[3]

首先，如果使用一个vector来初始化另一个vector，不管是copy initialization还是direct initialization，其含义都是拷贝initializer中所有的elements到新创建的这个vector中。注意新vector中的每一个元素都是initializer中对应的每一个元素的copy，而不是reference。

因为C++11中新增了list initializer，即使用curly braces{...}来作为initializer，所以要区分其与direct initializer（即使用parenthesis()）的区别。尤其是当initializer和vector的类型不一致的时候。例如，sv5，sv6和sv7都使用了{}，但是只有sv5是list initialization。因为并不能使用10作为initializer value来初始化sv6，所以此时compiler会尝试使用parenthesis initializer的方法来初始话这个vector。

Conclusions:

list initializer表示的是使用{}中的每一个value来进行初始化，{}中有几个值则vector中有几个elements，每个element的值是{}中对应的value。即list initializer一次可以提供多个初始值，{}中每个element都是一个初始值，但是parenthesis只能提供一个初始值。

parenthesis表示的是使用in-class initializer即()作为initializer。()中有两个参数：(count, value)。

当使用list initializer来初始化，但是{}中的值并不能初始化vector时，即vector<type>中的type与{}中的值的type不一致时（例如sv6），compiler则会考虑使用其他初始化方法。这种情况只针对list initializer，对于parenthesis则不会，类型不对时则会直接报错。

实际上，尽管我们可以在vector创建时通过初始化直接给vector添加元素并赋值，但是vector更多地适用于数组大小未知的情况。因此，大部分情况下我们都会定义一个空的vector然后通过push_back()来给这个vector添加元素。事实上，因为C++标准对于vector的implementation有效率要求，所以在创建vector时如果指定了vector的大小反而有可能对效率造成影响（除非所有的elements都是一样的值）。在C中，一般对于不知道数组大小的情况，我们通常通过单链表来解决；在C++中则可以很方便地使用vector。

`vector` Common Operations

Functions	Explanation
`v`.empty()	Returns `true`if `v` is empty; otherwise returns false.
`v`.size()	Returns the `number of elements` of `v`.
`v`.capacity()	Returns the current maximum `available memory space` for elements `(expressed in terms of elements)`.
`v`.push_back(`t`)	Adds an element with value `t` to end of `v`.
`v.`pop_back()	Removes the last element in the vector, effectively `reducing the container size by one`.
`v`[n]	Returns a reference to the element at position `n` in `v`.
`v1` = `v2`	Replaces the elements in `v1` with a copy of the elements in `v2`.
`v` = {a, b, c…}	Replaces the elements in `v` with a copy of the elements in the comma-separated list.
`v1` == `v2`	`v1` and `v2` are equal if they have same `size()` and each element in `v1` is equal to the corresponding element in `v2`.

在此需要对v.push_back(t)和v.pop_back()函数进行一些说明：

    vector<int> v;
    int i = 42;
    v.push_back(i);
    v[0] = 1;
    cout<<v[0]<<endl;  // 1
    cout<<i<<endl;  // 42
    v.pop_back();
    cout<<v.size()<<endl;  // 0

如之前所说，vector中的elements实际上是存放在堆上的一块连续内存，通过vector中的3个迭代器来指向该段内存来实现增删等操作。简单来说，v.push_back(t)会经历以下几个步骤（不考虑空间不足时的扩容操作）：

给vector添加一个element，因为此时内存足够，所以在堆上的那块连续内存中不需要执行什么操作。
调用object t的copy constructor将t拷贝到vector新添加的element所指向的地址。注意此时是深拷贝，即添加到element所指向地址的objec并不是原来的t。

这也是为什么push_bacn(t)的作用是Adds an element with value t to end of v.因为添加进堆上的object并不是t，只是t的拷贝，拥有t的值而已。这也是为什么代码示例中i输出还是42的原因。

同样，pop_back()并不需要传入参数，因为其作用是移除v中的最后一个元素（即--finish迭代器减少1），并且调用该元素中的object的析构函数进行析构。但是如果该objec没有合适的析构函数，例如基本类型int，bool等，则不会进行其他操作。**但是即使调用了析构函数也只是清除了这段内存上的内容（地址仍保留），但是并没有用deallocate回收这段内存，因此仍然可以强制通过subscript来拿到这段内存，here， here。

`vector size_type`

类似string，vector也有自己的size_type。也就是说，v.size()返回的结果是vector<int>::size_type类型（可以理解为unsigned）。但是需要注意的是，如果想要explicitly使用 vector<int>::size_type，那么需要指明这个vector中的element的类型：

vector<int>::size_type size; // ok
vector::size_type size;  // error

`Range for`

vector<int> v{1,2,3,4};
for(auto &i : v)  // using reference to modify the elements
  i *= i;
for(auto i : v)  // just printf the value of each element
  cout<<i<<endl;

`subscrit` operations

和string一样，也可以通过下标操作来获取vector中相应位置的element。不过需要注意的是在C中一般对于数组会利用for循环直接给数组赋值，但是在vector中不能这样，而是需要通过push_back()来添加element，在copy值到这个element上。例如：

int a[10];
for(int i=0; i<10; i++)  // ok, assign value to each element in an array
  a[i] = i;

vector<int> v;
for(int i=0; i!=10; i++)  //  error, becasue v is an empty vector
  v[i] = i;
// v.push_back(i);  //  need to use push_back(i) to add element;

另外，vector的下标操作也需要注意index必须是一个可以取到的值，不然就是buffer overflow问题，这也是leetcode中最常见的崩溃错误。

转载请注明来源，欢迎对文章中的引用来源进行考证，欢迎指出任何有错误或不够清晰的表达。可以在下面评论区评论，也可以邮件至 gzrjzcx@qq.com

C++Primer 3.3: Vector

Library vector Type

vector Summary

vector Definition and Initializition

vector Common Operations

vector size_type

Range for

subscrit operations