CodeFights Quora bot

This isn’t any homework questions or assignments, only my practice. In the CodeFights Quora bot. My code cannot pass even the first hidden input because of the execution time. I’m only a first-month programmer in university.

Here is the detailed question:

For the purposes of this problem, suppose that Quora has n questions, and question i takes ti time to read. Some questions are related to each other. If we connect related questions by edges, we get an undirected graph such that there exists exactly one path from any question to another. In other words, the graph of related questions is a tree.

Every time Steve reads a question, he will see a list of questions that are directly related to it and will navigate at random to one that he hasn’t read yet (all related questions have an equal chance of being viewed). Steve will stop reading once there are no unread related questions left.

Given the number of related questions n, an array that contains the estimated reading time for each question t, and an array containing the pairs of related questions edges, which question should we show to Steve first so that we minimize his total expected reading time? It is guaranteed that there is one unique question that is optimal.

Here’s how the total expected time for question i with q related questions is calculated:

Take the time ti that it will take Steve to read this question;
Recursively calculate the expected time, Ej for each related question j without considering the ith question;
Add to ti the sum of Ej for each j, divided by q, i.e. the answer will be equal to ti+jEjq.


For n = 5, t = [2, 2, 1, 2, 2] and edges = [[0, 1], [1, 2], [2, 3], [3, 4]], the output should be
relatedQuestions(n, t, edges) = 2.

Let’s calculate the answers for each of the 5 vertices:

If Steve starts reading from question 0, then the expected reading time equals t0+E11=t0+t1+E211==t0+t1+t2+t3+t4+011111=t0+t1+t2+t3+t4=2+2+1+2+2=9
If Steve starts reading from question 1, then the expected reading time equals t1+E0+E22=t1+t0+01+t2+E312=t1+t0+t2+t3+t4+01112=t1+t0+t2+t3+t42=2+2+1+2+22=5.5.
If Steve starts reading from question 2, then the expected reading time equals t2+E1+E32=t2+t1+E01+t3+E412=t2+t1+t0+t3+t42=1+2+2+2+22=5
The expected reading time for vertex 3 is equal to the expected reading time for vertex 1, because they are symmetric in the tree. The same works for vertices 4 and 0.
So, as we can see, the optimal vertex to start with is vertex 2, since that gives us the smallest expected reading time.


[time limit] 4000ms (py3)

[input] integer n

The number of questions.

Guaranteed constraints:

[input] array.integer t

An array of positive integers. ti is the time Steve needs to read question i for each valid i.

Guaranteed constraints:

[input] array.array.integer edges

An array containing n1 pairs of related questions. It is guaranteed that the questions form a tree.

Guaranteed constraints:

[output] integer

The 0-based index of the best question to show Steve first in order to minimize his expected reading time.

This is my solution:

from itertools import repeat
def relatedQuestions(n, t, edges):
    def expected_time(start, previous=None, end=None):
            ct, tm, l= 0, 0, set()
            for _i in edges:
                    if start in _i and (not previous in _i) and not end in _i:
                            ct += 1
                            p = next(filter(lambda x: x != start, _i))
                            tm += t[p]
            if end == None:
                    end = start
            rp = repeat(start, ct)
            re = repeat(end, ct)
            if ct:
                    return (tm + sum(map(expected_time, l, rp, re)))/ct
                    return t[start]
    mi = {}
    for _i in range(n):
            mi[str(_i)] = expected_time(_i) + t[_i]
    return int(sorted([(v,k) for (k,v) in mi.items()])[0][1])


1. Analysis

There are n vertices and n1 edges in the tree.

The algorithm in the post iterates over the n vertices, and computes the expected time starting at each vertex. It does this by recursing over the tree (thus calling expected_time once for each of the n vertices), and at each step it searches the list of n1 edges to find the neighbours of the current vertex.

This means that the total runtime is Θ(n3). Since n might be as large as 1,000, the program might have to execute the block of code starting if start in _i ... as many as 999,000,000 times. This is why your program runs out of time.

(If you’re studying computer science at university, there ought to be a course where you learn to do this kind of analysis. Pay attention in class: it’s one of the most valuable skills you can learn!)

2. A better algorithm

I’m not to going to spoil the problem by solving it for you. But I’ll sketch a couple of improvements:

First, find a way avoid iterating over the edges at each step. The idea is to preprocess the edges list into a more convenient data structure, one that will let you efficiently enumerate all the neighbours of a vertex. The data structure you need here is known as an adjacency list: that is, a mapping from a vertex to a list (or set) of its neighbouring vertices.

Second, avoid duplicate work. The key observation here is that expected_time(v, w) is always the same for a given pair of vertices v and w, but the algorithm in the post has to evaluate this many times. If you store the results of these calls the first time you compute them and then look them up instead of re-computing them, then you will save a lot of work. (This technique is known as memoization or dynamic programming.)

Combining adjacency lists and memoization should allow you to reduce the runtime to Θ(n), and so easily come in under the time limit.

Source : Link , Question Author : Dogemore , Answer Author : Gareth Rees

Leave a Comment