How to visualize combinations of data for a gene discovery application?

What kind of a visualization/chart should I use for showing all the ways you can choose from a set of items? (i.e., number of possible combinations)

Concretely, I am showing potential offspring from two animals, where each parent may possess some number of genes, and the offspring inherits 0, 1, or both parent genes of each type. The genes have fun names (e.g., fire), and sometimes combinations of those genes have their own names (fire + pastel = firefly), but this is beside the point.

Here’s a simple example that shows 2 and 2 genes from parents (with 1 shared), which makes for 2^2 = 16 possibilities.

enter image description here

The current UI (below) shows the list of possibilities, but nothing visually conveys the magnitude (In this case each is equally likely 1/16, but in many cases some possibilities will be 2/16 or 4/16). In other words, nothing about the “graphic” visually shows the user how likely each outcome is. Secondly, it would be great if the outcomes which share commonality (i.e., contain same genes) could be visually related.

My idea is something like a diamond shaped graph, or layered network, where at the top is the outcome where all genes are chosen, and below that a row of nodes with N-1, and so forth until the bottom row has 0 selected. Edges would connect the nodes beween layers with shared genes. Size of nodes could indicate probability. Something like this graph (but ignore the data).

enter image description here

I’m aware of Punnett Squares, but I’m not sure it’s the best for combinations of this order (for one it doesn’t not combine equivalent outcomes).


Number of results depends on how many genes the parents have combined (2^N). I expect most of the time it will be with 1-4 gene parents each, so not more than 7 genes total, or 2^7 = 128 possibilities. Also, if there is any duplication between parents this is less. For example with 2 of the 7 being shared genes, that makes 54 possibilities. (See this example live). So most of the time I think 10-50 results.


TL;DR? Scroll down for the examples.

This is an interesting question. A very complex case that I can only really suggest general ideas for without being on the inside, fully in the know.

It’s immediately clear that you’ll need to use many indicators in tandem to communicate all of the different levels and aspects of data in an easily consumable way.

The main indicators I’m thinking about for this case are:

  • Colour
  • Size
  • Heirarchy
  • Shapes

To summarise the aims I understand from the question:

  • You want to visually associate results that share common properties

  • You also need to indicate the statistical probability of each outcome

First off, I think you need to meaningfully separate the data to make it easier to consume, especially considering there can be up to 128 results.

To do this, I think you should break them down into groups of how many different types of genes they have, I can see some are ‘pure bloods’, some are ‘half-bloods’ and some are probably mixtures of 3 or more types. This is the broadest meaningful category available.

Next, because you want to indicate the probability / chance of each outcome, I think a colour range of yellow to orange (could be anything, I just chose this because it’s often used) would be a good indicator.

Lastly, because there are so many different types, and you want to show associations between ones that share the same genes, I think it might be good to define a colour palette for the ‘pure bloods’. You can then use this colour palette to easily show the similarities across each that share types.

enter image description here

There are other options, but this is designed based on the fact that it needs to be able to expand and contract significantly without breaking or becoming unreadable.

To demonstrate the potential for different cases using duplicated data:

enter image description here

Click on each image to see at full size (1000px)

I think a mapped out structure would be a cool third addition, but it would also be the one that requires the most work, while conveying the least easily consumable meaning to your audience. I hope this helps.

Source : Link , Question Author : John Lehmann , Answer Author : Dom

Leave a Comment