Tips and tricks from my Telegram-channel @pythonetc, March 201909.04.2019 16:33

It is a new selection of tips and tricks about Python and programming from my Telegram-channel @pythonetc.

Previous publications.

0_0 is a totally valid Python expression.
Sorting a list with None values can be challenging:

In [1]: data = [
   ...:     dict(a=1),
   ...:     None,
   ...:     dict(a=-3),
   ...:     dict(a=2),
   ...:     None,
   ...: ]

In [2]: sorted(data, key=lambda x: x['a'])
...
TypeError: 'NoneType' object is not subscriptable

You may try to remove Nones and put them back after sorting (to the end or the beginning of the list depending on your task):

In [3]: sorted(
   ...:     (d for d in data if d is not None),
   ...:     key=lambda x: x['a']
   ...: ) + [
   ...:     d for d in data if d is None
   ...: ]
Out[3]: [{'a': -3}, {'a': 1}, {'a': 2}, None, None]

That’s a mouthful. The better solution is to use more complex key:

In [4]: sorted(data, key=lambda x: float('inf') if x is None else x['a'])
Out[4]: [{'a': -3}, {'a': 1}, {'a': 2}, None, None]

For types where no infinity is available you can sort tuples instead:

In [5]: sorted(data, key=lambda x: (1, None) if x is None else (0, x['a']))
Out[5]: [{'a': -3}, {'a': 1}, {'a': 2}, None, None]

When you fork your process, the random seed you are using is copying across processes. That may lead to processes producing the same «random» result.

To avoid this, you have to manually call random.seed() in every process.

However, that is not the case if you are using the multiprocessing module, it is doing exactly that for you.

Here is an example:

import multiprocessing              
import random                       
import os                           
import sys                          
                                    
def test(a):                        
    print(random.choice(a), end=' ')
                                    
a = [1, 2, 3, 4, 5]                 
                                    
for _ in range(5):                  
    test(a)                         
print()                             
                                    
                                    
for _ in range(5):                  
    p = multiprocessing.Process(    
        target=test, args=(a,)      
    )                               
    p.start()                       
    p.join()                        
print()                             
                                    
for _ in range(5):                  
    pid = os.fork()                 
    if pid == 0:                    
        test(a)                     
        sys.exit()                  
    else:                           
        os.wait()                   
print()

The result is something like:

4 4 4 5 5
1 4 1 3 3
2 2 2 2 2

Moreover, if you are using Python 3.7 or newer, os.fork does the same as well, thanks to the new at_fork hook.

The output of the above code for Python 3.7 is:

1 2 2 1 5
4 4 4 5 5
2 4 1 3 1

It looks like sum([a, b, c]) is equivalent for a + b + c, while in fact it’s 0 + a + b + c. That means that it can’t work with types that don’t support adding to 0:

class MyInt:
    def __init__(self, value):
        self.value = value
    def __add__(self, other):
        return type(self)(self.value + other.value)
    def __radd__(self, other):
        return self + other
    def __repr__(self):
        class_name = type(self).__name__
        return f'{class_name}({self.value})'
In : sum([MyInt(1), MyInt(2)])
...
AttributeError: 'int' object has no attribute 'value'

To fix that you can provide custom start element that is used instead of 0:

In : sum([MyInt(1), MyInt(2)], MyInt(0))
Out: MyInt(3)

sum is well-optimized for summation of float and int types but can handle any other custom type. However, it refuses to sum bytes, bytearray and str since join is well-optimized for this operation:

In : sum(['a', 'b'], '')
...
TypeError: sum() can't sum strings [use ''.join(seq) instead]
In : ints = [x for x in range(10_000)]
In : my_ints = [Int(x) for x in ints]
In : %timeit sum(ints)
68.3 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In : %timeit sum(my_ints, Int(0))
5.81 ms ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can customize index completions in Jupyter notebook by providing the _ipython_key_completions_ method. This way you can control what is displayed when you press tab after something like d["x:

Note that the method doesn’t get the looked up string as an argument.