The first function is implemented for you. Make sure you understand what every line does before moving on.

In [1]:
def read_subtitle(srtfile):
    """
    Reads an srt subtitle file and returns a list of dictionaries.
    Dictionary keys are: 'start_time', 'end_time' and 'subtitle'.
    """
    
    #open the file
    f = open(srtfile)
    #read the contents of the file into `text`
    text = f.read()
    #close the file
    f.close()
    
    #A note:
    #File reading can be written more elegantly with the `with` statement:
    #
    #with open(srtfile) as f:
    #    text = f.read()
    
    
    #`str.strip` removes whitespace characters (' ', '\n', '\t') at the begining and at the end
    text = text.strip()
    #split at the blank line (after the subtitle itself)
    entry_list = text.split('\n\n')
    
    dict_list = [] #this will be the output of the function
    for i in entry_list:
        lines = i.split('\n')
        #the first entry is the index. We don't need that, the list is already indexed.
        #the second entry are the start time and the end time
        t_start, t_end = lines[1].split(' --> ') 
        #this is short-hand to `t_start = lines[1].split(' --> ')[0]` and `t_end...`
        
        #merge the rest
        sub = "\n".join(lines[2:])
        #create a dict...
        d = {'start_time': t_start, 'end_time': t_end, 'subtitle': sub}
        #...and append to the `dict_list`
        dict_list.append(d)
        
    
    return dict_list

At this point you may wonder why we used a list of dicts instead, for example, a list of lists. Indeed, instead of:
```python
    d = {'start_time': t_start, 'end_time': t_end, 'subtitle': sub}
```
we could have had:
```python
    d = [ t_start, t_end, sub ] 
```
Then we just need to remember that the 0th index is the start time etc.

Naturally, there are many possible solutions to the task, so feel free to customize if you want.

We can test the function on a small sample provided on the *matroska* webpage:

In [2]:
a = read_subtitle('sample.txt')
a[0]

{'start_time': '00:02:17,440',
 'end_time': '00:02:20,375',
 'subtitle': "Senator, we're making\nour final approach into Coruscant."}

Now that we have the subtitles organized we might want to define functions that operate on the structure we created. 
The first thing that comes to mind is a "find" function which I implemented for you below:

In [6]:
def find_subtitle(subtitles, s):
    """
    Returns a list of indices where the subtitles contain the string `s`
    """
    indices = []
    for i,ii in enumerate(subtitles): #`i` is the index, `ii` the value, i.e. subtitle dict
        if s in ii['subtitle']:
            indices.append(i)
    
    return indices

Test the find function, there is a subtitle file (only a small sample to avoid copyright issues) ready for you in this directory.
Here are some suggestions you can search for:  'lobster', 'bird', 'rabbit'.

In [None]:
### YOUR TESTS ###

In [3]:
def write_subtitle(filename,dict_list):
    """
    Write subtitles in `dict_list` to the file named `filename` in srt format.
    """
    
    ###############################
    #         MISSING CODE        #
    ###############################
    
    #open file, "w" for "write"
    f = open(filename,"w")
    f.write(YOUR_STRING)
    f.close()
    #Note: this can be written using the `with` statement as was noted in the reader function.
    
    return None #this line is can be left out

As you might have guessed, you will implement the body of this function yourself. Good luck!

Test your function before moving on.

Next, we will need to correct the timestamps. Before we do that however, it is convenient to define 2 utility functions:

In [7]:
def time_to_secs(time):
    """
    Input: subtitle start or end time as a `str` with the format 'hours:minutes:seconds,miliseconds'
    Output: `float` time in seconds
    """
    ###############################
    #         MISSING CODE        #
    ###############################

In [8]:
def secs_to_time(secs):
    """
    Inverse function to `time_to_secs()`. Returns a string.
    """
    hours = int(secs/3600) #`int()` rounds down
    ###############################
    #         MISSING CODE        #
    ###############################

Again, test your functions after implementing them.

To adjust the subtitles times you would usually want to choose 2 subtitles and provide the corresponding (correct) times. 
Assuming that the wrong subtitle times are related to the correct times by a **linear** function, these will be enough.
We will use the `start_time` for the adjustment.

In [3]:
def adjust(subs, ind1, correct_time1, ind2, correct_time2):
    """
    Adjusts subtitles.
    Input:
        ind -- index of subtitle in subs
        time -- correct start time for subtitle, either in seconds or hh:mm:ss,miliseconds format
    """
    #decide whether the times are in secs (float) or hh:mm:ss,milisecs format based on the type
    if type(correct_time1) is str:
        correct_time1 = time_to_secs(correct_time1)
    if type(correct_time2) is str:
        correct_time2 = time_to_secs(correct_time2)
    
    #get the corresponding subtitle times
    wrong_time1 = time_to_secs(subs[ind1]['start_time'])
    wrong_time2 = time_to_secs(subs[ind2]['start_time'])
    
    #the following is based on a linear relation:
    #   `wrong_time = a*correct_time + b
    a = (wrong_time2 - wrong_time1)/(correct_time2 - correct_time1)
    b = (wrong_time2 - a*correct_time2)
    
    def linear_correction(time):
        return secs_to_time((time_to_secs(x) - b)/a)
    
    for i in subs:
        i['start_time'] = linear_correction(i['start_time'])
        i['end_time'] = linear_correction(i['end_time'])
    
    return None #this line is can be left out
    #returns None since dictionaries and lists can be and are changed in a function like this

If all goes well at this point you have a script which corrects subtitles :)