Tag: CSV

Collecting Lists In a Ruby Hash and the ‘<<=' operator

The other day, I needed to quickly analyse a data set that came in form of a large CSV file. I wanted to collect a particular column of that table and collect all entries categorised by a key in another column.

A simplified version of the table could look like this:

Key Value1 Interesting_Value Other_Value
foo1723.5X
bar211.75Q
foo4212.6B
baz2717.8F
bar4947.2K

I strived for something like this:

result = { 
  foo: [23.5, 12.6],
  bar: [1.75, 47.2],
  baz: [17.8],
 }

Iterating over the rows is easy, and getting to the columns is no problem either: The CSV gem is well documented and supports this easily.

A nice way to accumulate data is Enumerable#each_with_object. Since I wanted the result to be grouped by a key value, I’d pass a Hash as the initial argument.

Step 1: each_with_object({})

However, since I’ve planned to append values for changing keys, the default value needed to be an Array, not the default of nil.

Step 2: each_with_object(Hash.new([])

This, however, returns the same empty Array, when a key isn’t found, but I wanted a new empty Array:

Step 3: each_with_object(Hash.new { [] })

This executs the block every time a default values is needed (i.e. the given key isn’t yet in the Hash).

The next step is to append the value found in a row to the (potentially new and empty) Array for the given key.

I thought it would work this way:

data_table.each_with_object( Hash.new { [] }) do |row, acc|
  acc[row['Key']] << row['Interesting_Value'] 
end

But, no, the result of this code is an empty Hash! It needs to be the <<= operator to work, as shown in the snippet of a pry session:

[2] pry(main)> data_table = CSV.read 'table.csv', headers: true
=> #<CSV::Table mode:col_or_row row_count:6>
[3] pry(main)> data_table.each_with_object( Hash.new { [] }) do |row, acc|
[3] pry(main)*   acc[row['Key']] <<= row['Interesting_Value']
[3] pry(main)* end
=> {"foo"=>["23.5", "12.6"], "bar"=>["1.75", "47.2"], "baz"=>["17.8"]}

It seems to me, that the Hash lookup with the given default value [] returns an Array, and the append operator << does in fact append the passed object to that Array, but then the result of that does not end up as a (new) value fo the given Hash key. In contrast, the <<= operator does assign the result of the append operation.