Collecting Lists In a Ruby Hash and the ‘<<=' operator

The other day, I needed to quickly analyse a data set that came in form of a large CSV file. I wanted to collect a particular column of that table and collect all entries categorised by a key in another column.

A simplified version of the table could look like this:

Key Value1 Interesting_Value Other_Value
foo1723.5X
bar211.75Q
foo4212.6B
baz2717.8F
bar4947.2K

I strived for something like this:

result = { 
  foo: [23.5, 12.6],
  bar: [1.75, 47.2],
  baz: [17.8],
 }

Iterating over the rows is easy, and getting to the columns is no problem either: The CSV gem is well documented and supports this easily.

A nice way to accumulate data is Enumerable#each_with_object. Since I wanted the result to be grouped by a key value, I’d pass a Hash as the initial argument.

Step 1: each_with_object({})

However, since I’ve planned to append values for changing keys, the default value needed to be an Array, not the default of nil.

Step 2: each_with_object(Hash.new([])

This, however, returns the same empty Array, when a key isn’t found, but I wanted a new empty Array:

Step 3: each_with_object(Hash.new { [] })

This executs the block every time a default values is needed (i.e. the given key isn’t yet in the Hash).

The next step is to append the value found in a row to the (potentially new and empty) Array for the given key.

I thought it would work this way:

data_table.each_with_object( Hash.new { [] }) do |row, acc|
  acc[row['Key']] << row['Interesting_Value'] 
end

But, no, the result of this code is an empty Hash! It needs to be the <<= operator to work, as shown in the snippet of a pry session:

[2] pry(main)> data_table = CSV.read 'table.csv', headers: true
=> #<CSV::Table mode:col_or_row row_count:6>
[3] pry(main)> data_table.each_with_object( Hash.new { [] }) do |row, acc|
[3] pry(main)*   acc[row['Key']] <<= row['Interesting_Value']
[3] pry(main)* end
=> {"foo"=>["23.5", "12.6"], "bar"=>["1.75", "47.2"], "baz"=>["17.8"]}

It seems to me, that the Hash lookup with the given default value [] returns an Array, and the append operator << does in fact append the passed object to that Array, but then the result of that does not end up as a (new) value fo the given Hash key. In contrast, the <<= operator does assign the result of the append operation.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Seaside Testing

Subscribe now to keep reading and get access to the full archive.

Continue reading