Category: Learning

Contributing to Open Source – Getting Started

A sign in LegoLand saying "Adventure Land"

Recently, I started to contribute more to open source projects I use and like. When I started to contribute at all, I did so, with the most obvious thing for me: Bug reports. 🙂 This can be particularly useful, since the bug reports that I write for my projects, stay behind closed doors. – Like it or not, companies producing commercial software very often don’t fancy public bug reports. That’s a long story for another time…

Here are some tips:

  • Start small. Probably smaller than that.
    You can start on documentation. ➙ It can be unclear or contain typos.
    My smallest contribution so far, was fixing a typo in the Cucumber documentation.
    The steps to deliver it were large, compared to the fix itself:
    • Fork a repository
    • Create a branch on the fork
    • Fix the typo
    • Create a pull request
    • Get it accepted
      This seems like a lot, just to fix a typo.
      However, I think the learning experience was worth it.
  • Learn about the two most used models used in open source projects, the “Fork and Pull Model” and the “Shared Repository Model”.
    Both are explained in the GitHub docs about “collaborative development models“.
  • Check out Marko Denic on Twitter. He wrote “Make Your First Open Source Contribution” which explains to to get started on GitHub.
    You can contribute to his site https://tech-blogs.dev by adding you own blog, if you have one. The project is hosted on GitHub.
  • Start in a friendly & helpful community
    I find this really important: When you have a safe place to ask all the question you may have, and can expect a friendly answer, that will help actually asking those questions. It certainly helped me, asking about which development model is used, how big or small a pull request should be, etc.
  • Create your own project and make it open source
    Again: You can start really small. My smallest GitHub repository is probably https://github.com/s2k/seasidetestings-iterm-colours. It’s just only file that can be used as a colour preset in iTerm2 and some documentation how to load the file in iTerm.
  • If you find that something can be improved or isn’t working as expected, provide a bug report. These are important contributions to any software project: If the people writing the software don’t know that something’s wrong with it, they can’t fix it.
    For example I wrote https://github.com/fakefs/fakefs/issues/224, when I noticed that this Rubygem didn’t worked the way I expected.
    Since I knew how to write the RSpec to check this, I did – and it probably helped to get it fixed.

If you’re looking for much more information about how contributing to open source this book may be for you: “Forge Your Future with Open Source” by VM (Vicky) Brasseur.

Why Good Error Messages Can Save Time and Effort

TL;DR

Make error messages precise enough to help users, so that they can resolve the problem, or at least enable them to provide meaningful input when talking to support.

The Context

For a project, I was working on an application that stored it’s content as text files. Git was used as a storage backend, so the application could track who made what changes when. – So far so good. Testing this application was done on a specific test environment somewhere on the net.

While testing was possible, it was inconvenient to have to also log in the the logging server so track what exactly was happening, and updating the application itself was not as easy either. Therefore, one day I started to set up the whole thing on my local machine. The setup and configuration was surprisingly easy. We had good documentation for where to put which files. This was the easy part. I could start the application & tell it Git repository contained the application content (those text files mentioned above).

The Problem

The problem occurred, when I tried to actually get the application data:

XYZ cannot access the repository!”

The application’s error message as displayed to the user

That’s not particularly helpful. On order to support failure analysis, an error message shown to a user should help identifying the problem and solving it, or to have a meaningful exchange with support.

I checked the log files and quickly found some related information:


<timestamp> INFO : <class_info_and_linenumber> - SSH folder: <absolute_path_to_ssh_folder>
<timestamp> DEBUG: <class_info_and_linenumber> - SSH identity file: <absolute_path_to_ssh_key_file>
<timestamp> ERROR: <class_info_and_linenumber> - Exception <library_name>Exception: Invalid private key: <key_identifier> requesting Git repo
<full_class_name>.<ExceptionType>: Cannot list remotes for git url <ssh_url_to_git_repository>

I found this surprising, since I was using the exact same identity files to access the very same Git repository from my IDE & other Git clients on my machine. Had something corrupted my key? If so, what was it?

I started by increasing the logging of the SSH library I’m using, to make sure I was using the exact same key pair in all situations. A good while later, after googling for the error message, talking to developers, and reading some documentation, I found out this: The library being used wasn’t coping well with the encryption algorithm I used when generating my public & private key pair.

While I used ‘ed25519‘, a (relatively) recent algorithm (at the time of writing this), the library (in the version that was used in the application) expected the key to be generated using ‘RSA‘. Some details are also in the post ‘“Invalid privatekey” when using JSch‘ on stackoverflow. The problem is the misleading error message: The key wasn’t invalid at all, since I could access the repository using it with other programs that (apparently) used other libraries. The key type was unknown to the library. That’s similar, but different.

A Better Error Message

Had the library error message been something like Key <key_identifier> not recognised; see documentation for supported key types, identifying and solving the problem would have been much faster and less frustrating. Words matter in error messages, too.

Error Messages Should Help You Resolve the Error

I prefer error messages to be detailed enough to help me identify the real cause of the issue and ideally give a hint at what I can do.

Take widespread error messages for pass word fields for example. They may read like this:

Your password must have at least
12 characters overall
1 uppercase & 1 lowercase character
1 number
1 of these characters: # $ % & – ? = / . , ;

A contrived error message, that’s entirely likely

Yes, there’s a lot to be said about the value and limitations of requirements like these, but at least the message helps a user to set a password that complies with the mentioned rules.

A good error message can help users to identify a problem and probably resolve it as well. They don’t have to contact your support team. Your support team doesn’t have to go though the process of identifying the same problem(s) over and over. This saves time (and probably money) on both sides, yours and the user’s.

The Different Ways of Pry and Irb

I ran into an interesting little problem when doing sone Ruby (and Rails) work today: I wanted to try something in pry and, inside that Rails project I issued the following command:

$ pry -r sqlite3
…/gems/pry-0.14.1/lib/pry/pry_class.rb:103:in `require': cannot load such file -- sqlite3 (LoadError)
  from …/gems/pry-0.14.1/lib/pry/pry_class.rb:103:in `block in load_requires'
  from …/gems/pry-0.14.1/lib/pry/pry_class.rb:102:in `each'
  from …/gems/pry-0.14.1/lib/pry/pry_class.rb:102:in `load_requires'
  from …/gems/pry-0.14.1/lib/pry/pry_class.rb:143:in `final_session_setup'
  from …/gems/pry-0.14.1/lib/pry/cli.rb:82:in `parse_options'
  from …/gems/pry-0.14.1/bin/pry:12:in `<top (required)>'
  from …/bin/pry:23:in `load'
  from …/bin/pry:23:in `<main>'
  from …/bin/ruby_executable_hooks:22:in `eval'
  from …/bin/ruby_executable_hooks:22:in `<main>'

Interesting! I have both gems, pry and sqlite3 installed. A similar attempt using irb worked fine:

$ irb -r sqlite3
3.0.1 :001 > SQLite3::VERSION
 => "1.4.2"
3.0.1 :002 >

Maybe I confused the environment by setting some environment variable or other? With a newly opened command shell I got this:

$ pry -r sqlite3
[1] pry(main)> SQLite3::VERSION
=> "1.4.2"
[2] pry(main)>

Remarkable. In that new shell the irb command behaved the same as before.

The only difference I could find was that the original call to pry happened in a Rails project – which is using Bundler to organise the Rubygems used in the project and resolve dependencies. As a small experiment I set up a minimal project that uses bundler and a rather short Gemfile:

source "https://rubygems.org"

gem  'limit_detectors'

The gem ‘limit_detectors’ is one I wrote and I know that is doesn’t depend on other gems. Issuing the pry command again … worked? Why? Some more investigation was needed…

Finally, I realised that another gem (Guard, to be specific), uses pry. Therefore my Rails project’s development environment indirectly depends on pry. It looks like pry recognises that it’s running as part of a Bundler project, even when it is not explicitly called using bundle exec pry …, which in turn causes it (pry) to ‘only’ recognise the gems that are also installed using Bundler. – And sqlite3 isn’t in this case, since I’m using PostgreSQL throughout.

Since irb comes with the Ruby installation and is not part of the bundled gems, it ignores the bundler context when called like this:

irb -r sqlite3

However, explicitly calling it in the bundler context in a project that doesn’t depend on ‘sqlite3’ (directly or indirectly) will cause an error message:

$ bundle exec irb -r sqlite3
…/rubies/ruby-3.0.1/lib/ruby/3.0.0/irb/init.rb:376: warning: LoadError: cannot load such file -- sqlite3
3.0.1 :001 >

Nice! This is essentially the same problem, I faced at the very beginning, when pry couldn’t find the sqlite3 gem.

Collecting Lists In a Ruby Hash and the ‘<<=' operator

The other day, I needed to quickly analyse a data set that came in form of a large CSV file. I wanted to collect a particular column of that table and collect all entries categorised by a key in another column.

A simplified version of the table could look like this:

Key Value1 Interesting_Value Other_Value
foo1723.5X
bar211.75Q
foo4212.6B
baz2717.8F
bar4947.2K

I strived for something like this:

result = { 
  foo: [23.5, 12.6],
  bar: [1.75, 47.2],
  baz: [17.8],
 }

Iterating over the rows is easy, and getting to the columns is no problem either: The CSV gem is well documented and supports this easily.

A nice way to accumulate data is Enumerable#each_with_object. Since I wanted the result to be grouped by a key value, I’d pass a Hash as the initial argument.

Step 1: each_with_object({})

However, since I’ve planned to append values for changing keys, the default value needed to be an Array, not the default of nil.

Step 2: each_with_object(Hash.new([])

This, however, returns the same empty Array, when a key isn’t found, but I wanted a new empty Array:

Step 3: each_with_object(Hash.new { [] })

This executs the block every time a default values is needed (i.e. the given key isn’t yet in the Hash).

The next step is to append the value found in a row to the (potentially new and empty) Array for the given key.

I thought it would work this way:

data_table.each_with_object( Hash.new { [] }) do |row, acc|
  acc[row['Key']] << row['Interesting_Value'] 
end

But, no, the result of this code is an empty Hash! It needs to be the <<= operator to work, as shown in the snippet of a pry session:

[2] pry(main)> data_table = CSV.read 'table.csv', headers: true
=> #<CSV::Table mode:col_or_row row_count:6>
[3] pry(main)> data_table.each_with_object( Hash.new { [] }) do |row, acc|
[3] pry(main)*   acc[row['Key']] <<= row['Interesting_Value']
[3] pry(main)* end
=> {"foo"=>["23.5", "12.6"], "bar"=>["1.75", "47.2"], "baz"=>["17.8"]}

It seems to me, that the Hash lookup with the given default value [] returns an Array, and the append operator << does in fact append the passed object to that Array, but then the result of that does not end up as a (new) value fo the given Hash key. In contrast, the <<= operator does assign the result of the append operation.

Writing – Advice from Johanna Rothman

Just before Christmas (2020) Jabe Bloom asked this over on Twitter::

Johanna Rothmann‘s answers were short and easy to understand (but maybe hard to actually do):

I (publicly) agreed…

…and am now signed up to be informed when the book writing class opens.

It already showing some effect: I’m (re-) starting to update “Fast Feedback Using Ruby” (for the just released Ruby 3). 🙂

Navigation

%d bloggers like this: