Let’s iterate together
Implementing Python’s zip in C++
Disclaimer
To keep things neat and clean, I will be omitting the std
namespace qualifier everywhere except where I can’t.
The Iterator Pattern
What a nifty (not so) little feature! In a nutshell, iterators enable us to hop between elements in a container, thus, separating code dealing with what to do to those elements without giving much of a thought on how to traverse it.
It was an ecstatic moment when C++11 introduced range-based for-loop. If a container has an iterator interface, i.e. support for begin
, end
, etc., you could iterate over its elements using this new kind of for-loop. The syntax looks so expressive in its intent.
vector v{1, 2, 3, 4, 5};
for (auto &e : v)
cout << e << endl;
FYI, the above example uses auto
and CTAD as well. Interestingly, this is exactly the way we write for-loops in Python.
my_list = [1, 2, 3, 4, 5]
for e in my_list:
print(e)
Here’s to zip
It’s not uncommon to see code in which two or more sequences (or Iterables in Python lingo) are iterated over together in a loop. One of the ways this can be done is by using a built-in zip
function.
a = [1, 2, 3, 4]
b = 'hello'
for x, y in zip(a, b):
print(x, y)# prints
# 1 h
# 2 e
# 3 l
# 4 l
This is so expressive, compact, and we can do that with any number of iterables. Could we do something similar in C++? Is there anything in C++ corresponding to x, y in zip(a, b)
syntax in Python? Structured binding (C++17) comes to mind. We would like to write something like this:
for (auto [x, y, z]: zip(a, b, c)) { /* ... */ }
This code roughly gives us an idea of the requirements that zip needs to fulfill.
- The call to
zip
must return something which has an iterator interface. That is, it must havebegin
andend
methods defined. - The iterator obtained from the return object must have
operator*()
(de-reference),operator++()
(pre-increment) andoperator!=()
(inequality) operator methods defined. These would be consumed by the (client’s) for-loop internally. zip
should support any number of arguments of any type (container/container-like type of course).
Implementation
Taking into account of the requirement mentioned previously, we can write a skeleton code for zip as below:
template <typename ...T>
struct zip
{
struct iterator { /* ... */ }; zip(const T &...args)
: data(forward_as_tuple(forward<T>(args)...)) {} auto begin() { return iterator{ /* ... */ }; }
auto end() { return iterator{ /* ... */ }; }private:
tuple<T &...> data;};
To keep it simple, I’m using a tuple of references because I don’t want any ownership of its arguments. Now, we need to figure out the commented parts.
In the previous for-loop, the structured binding will work if zip::iterator
‘s de-reference operator returns a tuple. This tuple needs to contain reference of elements of each container in zip::data
. To mediate between these two tuples, we need another tuple containing iterators of each container. We can keep this tuple inside zip::iterator
. In a sense, this struct acts as a proxy for the tuple of iterators.
But how do we find the iterator type of each container? Recall that when we call begin
on a container we get its iterator. Using decltype
and declval
, we can get its type, viz. decltype(begin(declval<T&>()))
. In C++20, we can write ranges::iterator_t<T>
instead.
Besides that, it’s good practice to have the usual nested types the standard library iterators have. This allows for better interoperability with the algorithms from the standard library. Using iterator traits, we can get the required types from an iterator type, viz. iterator_traits<ranges::iterator_t<T>>::value_type
, iterator_traits<ranges::iterator_t<T>>::reference
etc. With C++20, we can write the corresponding types as iter_value_t<ranges::iterator_t<T>
, iter_reference_t<ranges::iterator_t<T>
etc. respectively.
Since we have a tuple of T
s, the above nested types will be written as a tuple of these types in zip::iterator
as shown below:
struct iterator
{
using iterator_category = input_iterator_tag;
using value_type = tuple<iter_value_t<ranges::iterator_t<T>>...>;
using reference = tuple<iter_reference_t<ranges::iterator_t<T>>...>;
using difference_type = tuple<iter_difference_t<ranges::iterator_t<T>>...>;
using pointer = std::tuple<typename iterator_traits<ranges::iterator_t<T>>::pointer...>; tuple<ranges::iterator_t<T>...> data_;
};
In addition to that, this struct also needs to define the previously mentioned operator methods which we will get to shortly.
The begin
and end
methods of zip
return an iterator. In begin
, we need to pass a tuple of iterators to the zip::iterator
. How do we get iterators of each container in zip::data
? We call begin
on each container. Duh!
Luckily, we don’t have to reinvent the wheel. We already have with us the apply
function from C++17. It takes a callable and a tuple as arguments.
template <class F, class Tuple>
constexpr decltype(auto) apply(F&& f, Tuple&& t);
What this function essentially does is, it calls the callable f
with the tuple elements of t
as arguments. Inside the callable, we can call std::begin
on each argument. Voila! We have transformed a tuple of containers to a tuple of iterators. Our begin
method would look like this:
auto begin()
{
return iterator(apply([]<typename... _Tp>(_Tp && ...e) {
return make_tuple(std::begin(forward<_Tp>(e))...);
}, data));
}
This code uses explicit template parameter lists introduced in C++20. Alternatively, one can use the generic lambda form if C++20 is not available; in which case, replace _Tp && …e
with auto && …e
and forward<_Tp>
with forward<decltype(e)>
. We can write the end
method similarly. Just replace the inner std::begin
call with an std::end
call.
Okay, now we are left with defining the operator methods of zip::iterator
. Since, this struct is a proxy for underlying iterators we have kept inside a tuple zip::iterator::data_
, we want to simply call the corresponding operators of those iterators. We can use the previous technique using apply
function again:
struct iterator
{
/* nested type declarations */ reference operator*() {
return apply([]<typename... _Tp>(_Tp && ...e) {
return forward_as_tuple(*forward<_Tp>(e)...);
}, data_);
}
iterator &operator++() {
return apply([]<typename... _Tp>(_Tp && ...e) {
return make_tuple(++forward<_Tp>(e)...);
}, data_);
}
bool operator!=(const iterator &iter) const { /* ... */ }tuple<ranges::iterator_t<T>...> data_;
};
The inequality operator is a little bit tricky though. The for-loop (of client code which uses zip
) should end the iteration when any one of the underlying iterators become end-iterators. So, this method should compare its data_
with iter::data_
comparing element by element and returning false
when any of the corresponding elements are equal.
Since this comparison involves two tuples, we cannot use apply
directly. We need to write some helper functions. Also, we need a mechanism to unpack the tuples’ elements. We can use index_sequence
for this mechanism.
namespace detail
{
template <typename... _Tp>
bool variadic_or(_Tp &&...args) { return (... || args); } template <typename Tuple, std::size_t... I>
bool any_equals(Tuple &&t1, Tuple &&t2, index_sequence<I...>)
{
return variadic_or(get<I>(forward<Tuple>(t1)) == get<I>(forward<Tuple>(t2))...);
}
}
Now we can write the inequality operator method as follows:
bool operator!=(const iterator &iter)
{
return !detail::any_equals(data_, iter.data_, index_sequence_for<T...>{});
}
Our implementation is now complete. Using zip
we can now write in C++, the code below:
vector v{1, 2, 3, 4};
string s = "hello";
double d[] = {1.2, 2.3, 3.4};
for (auto [x, y, z] : zip(v, s, d))
cout << x << ' ' << y << ' ' << z << endl;/* prints
* 1 h 1.2
* 2 e 2.3
* 3 l 3.4
*/
Isn’t that neat! Here’s a link with the implementation: https://godbolt.org/z/neaTjY3P9.
Thanks for reading
Let me know what you think about this in the comments section. Suggestions are always welcome. Happy coding :)