Python etc

Channel address:

Categories: Technologies

Language: English

Subscribers: 6.28K

Description from channel

Regular tips about Python and programming in general
Owner — @pushtaev
The current season is run by @orsinium
Tips are appreciated: https://ko-fi.com/pythonetc / https://sobe.ru/na/pythonetc
© CC BY-SA 4.0 — mention if repost

▲ Vote (1)

Ratings & Reviews

4.00

2 reviews

Reviews can be left only by registered users. All reviews are moderated by admins.

5 stars

4 stars

3 stars

2 stars

1 stars

The latest Messages 6

2020-12-22 18:00:04 Always precompile regular expressions using re.compile if the expression is known in advance:

# generate random string
from string import printable
from random import choice
text = ''.join(choice(printable) for _ in range(10 * 8))

# let's find numbers
pat = r'\d(?:[\d\.]+\d)*'
rex = re.compile(pat)

%timeit re.findall(pat, text)
# 2.08 µs ± 1.89 ns per loop

# pre-compiled almost twice faster
%timeit rex.findall(text)
# 1.3 µs ± 68.8 ns per loop

The secret is that module-level re functions just compile the expression and call the corresponding method, no optimizations involved:

def findall(pattern, string, flags=0):
return _compile(pattern, flags).findall(string)

If the expression is not known in advance but can be used repeatedly, consider using functools.lru_cache:

from functools import lru_cache

cached_compile = lru_cache(maxsize=64)(re.compile)

def find_all(pattern, text):
return cached_compile(pattern).findall(text)

4.8K views15:00

Open / Comment

2020-12-17 18:00:04 The decorator functools.lru_cache named so because of the underlying cache replacement policy. When the cache size limit is reached Least Recently Used records removed first:

from functools import lru_cache

@lru_cache(maxsize=2)
def say(phrase):
print(phrase)

say('1')
# 1

say('2')
# 2

say('1')

# push a record out of the cache
say('3')
# 3

# '1' is still cached since it was used recently
say('1')

# but '2' was removed from cache
say('2')
# 2

To avoid the limit, you can pass maxsize=None:

@lru_cache(maxsize=None)
def fib(n):
if n <= 2:
return 1
return fib(n-1) + fib(n-2)

fib(30)
# 832040

fib.cache_info()
# CacheInfo(hits=27, misses=30, maxsize=None, currsize=30)

Python 3.9 introduced functools.cache which is the same as lru_cache(maxsize=None) but a little bit faster because it doesn't have all that LRU-related logic inside:

from functools import cache

@cache
def fib_cache(n):
if n <= 2:
return 1
return fib(n-1) + fib(n-2)

fib_cache(30)
# 832040

%timeit fib(30)
# 63 ns ± 0.574 ns per loop

%timeit fib_cache(30)
# 61.8 ns ± 0.409 ns per loop

4.3K views15:00

Open / Comment

2020-12-15 18:00:05 Decorator functools.lru_cache caches the function result based on the given arguments:

from functools import lru_cache
@lru_cache(maxsize=32)
def say(phrase):
print(phrase)
return len(phrase)

say('hello')
# hello
# 5

say('pythonetc')
# pythonetc
# 9

# the function is not called, the result is cached
say('hello')
# 5

The only limitation is that all arguments must be hashable:

say({})
# TypeError: unhashable type: 'dict'

The decorator is useful for recursive algorithms and costly operations:

@lru_cache(maxsize=32)
def fib(n):
if n <= 2:
return 1
return fib(n-1) + fib(n-2)

fib(30)
# 832040

Also, the decorator provides a few helpful methods:

fib.cache_info()
# CacheInfo(hits=27, misses=30, maxsize=32, currsize=30)

fib.cache_clear()
fib.cache_info()
# CacheInfo(hits=0, misses=0, maxsize=32, currsize=0)

# Introduced in Python 3.9:
fib.cache_parameters()
# {'maxsize': None, 'typed': False}

And the last thing for today, you'll be surprised how fast lru_cache is:

def nop():
return None

@lru_cache(maxsize=1)
def nop_cached():
return None

%timeit nop()
# 49 ns ± 0.348 ns per loop

# cached faster!
%timeit nop_cached()
# 39.3 ns ± 0.118 ns per loop

4.1K views15:00

Open / Comment

2020-12-10 18:00:06 The module array is helpful if you want to be memory efficient or interoperate with C. However, working with array can be slower than with list:

import random
import array
lst = [random.randint(0, 1000) for _ in range(100000)]
arr = array.array('i', lst)

%timeit for i in lst: pass
# 1.05 ms ± 1.61 µs per loop

%timeit for i in arr: pass
# 2.63 ms ± 60.2 µs per loop

%timeit for i in range(len(lst)): lst[i]
# 5.42 ms ± 7.56 µs per loop

%timeit for i in range(len(arr)): arr[i]
# 7.8 ms ± 449 µs per loop

The reason is that because int in Python is a boxed object, and wrapping raw integer value into Python int takes some time.

4.0K views15:00

Open / Comment

2020-12-08 18:00:05 IPython is an alternative interactive shell for Python. It has syntax highlighting, powerful introspection and autocomplete, searchable cross-session history, and much more. Run %quickref in IPython to get a quick reference on useful commands and shortcuts. Some of our favorite ones:

+ obj? - print short object info, including signature and docstring.
+ obj?? - same as above but also shows the object source code if available.
+ !cd my_project/ - execute a shell command.
+ %timeit list(range(1000)) - run a statement many times and show the execution time statistics.
+ %hist - show the history for the current session.
+ %run - run a file in the current session.

4.3K views15:00

Open / Comment

2020-12-03 18:00:04 json.dumps can serialize every built-in type which has a corresponding JSON type (int as number, None as null, list as array etc) but fails for every other type. Probably, the most often case when you will face it is when trying to serialize a datetime object:

import json
from datetime import datetime

json.dumps([123, 'hello'])
# '[123, "hello"]'

json.dumps(datetime.now())
# TypeError: Object of type 'datetime' is not JSON serializable

The fastest way to fix it is to provide a custom default serializer:

json.dumps(datetime.now(), default=str)
# '"2020-12-03 18:00:10.592496"'

However, that means that every unknown object will be serialized into a string which can lead to unexpected result:

class C: pass
json.dumps(C(), default=str)
'"<__main__.C object at 0x7f330ec801d0>"'

So, if you want to serialize only datetime and nothing else, it's better to define a custom encoder:

class DateTimeEncoder(json.JSONEncoder):
def default(self, obj) -> str:
if isinstance(obj, datetime):
return obj.isoformat()
return super().default(obj)

json.dumps(datetime.now(), cls=DateTimeEncoder)
'"2020-12-03T18:01:19.609648"'

json.dumps(C(), cls=DateTimeEncoder)
# TypeError: Object of type 'C' is not JSON serializable

5.6K views15:00

Open / Comment

2020-12-01 18:00:09 Python has rich support for Unicode, including referencing glyphs (including emojis, of course) by name.

Get glyph name:

' '.encode('ascii', 'namereplace')
# b'\\N{ROLLING ON THE FLOOR LAUGHING}'

Convert name to a glyph:

'\N{ROLLING ON THE FLOOR LAUGHING}'
# ' '

# case doesn't matter:
'\N{Rolling on the Floor Laughing}'
# ' '

A good thing is that f-string also don't confused by named unicode glyphs:

fire = 'hello'
f'{fire} \N{fire}'
# 'hello '

10.5K views15:00

Open / Comment

2020-11-26 18:00:06 We use Arabic digits to record numbers. However, there are many more numeral systems: Chinese (and Suzhou), Chakma, Persian, Hebrew, and so on. And Python supports them when detecting numbers:

int('٤٢')
# 42

'٤٢'.isdigit()
# True

import re
re.compile('\d+').match('٤٢')
#

If you want to match only Arabic numerals, make an explicit check for it:

n = '٤٢'
n.isdigit() and n.isascii()
# False

re.compile('[0-9]+').match(n)
# None

Let's make the full list of supported numerals:

from collections import defaultdict
nums = defaultdict(str)
for i in range(0x110000):
try:
int(chr(i))
except:
pass
else:
nums[int(chr(i))] += chr(i)
dict(nums)

7.3K viewsedited 15:00

Open / Comment

2020-11-24 19:19:39 In Python 3.3, PEP-3155 introduced a new __qualname__ attribute for classes and functions which contains a full dotted path to the definition of the given object.

class A:
class B:
def f(self):
def g():
pass
return g

A.B.f.__name__
# 'f'

A.B.f.__qualname__
# 'A.B.f'

g = A.B().f()
g.__name__
# 'g'

g.__qualname__
# 'A.B.f..g'

4.2K views16:19

Open / Comment

2020-11-19 18:00:05 In Python 2.5, PEP-357 allowed any object to be passed as index or slice into __getitem__:

class L:
def __getitem__(self, value):
return value

class C:
pass

L()[C]
#

Also, it introduced a magic method __index__. it was passed instead of the object in slices and used in list and tuple to convert the given object to int:

class C:
def __index__(self):
return 1

# Python 2 and 3
L()[C()]
# <__main__.C ...>

L()[C():]
# Python 2:
# slice(1, 9223372036854775807, None)
# Python 3:
# slice(<__main__.C object ...>, None, None)

# python 2 and 3
[1,2,3][C()]
# 2

The main motivation to add __index__ was to support slices in numpy with custom number types:

two = numpy.int64(2)

type(two)
# numpy.int64

type(two.__index__())
# int

Now it is mostly useless. However, it is a good example of language changes to meet the needs of a particular third-party library.

5.1K views15:00

Open / Comment

Python etc

Ratings & Reviews

The latest Messages 6

Popular Channels

Related Chats

Popular Channels

Login