转载

翻译《Writing Idiomatic Python》（五）：类、上下文管理器、生成器

原书参考： http://www.jeffknupp.com/blog/2012/10/04/writing-idiomatic-python/

上一篇：翻译《Writing Idiomatic Python》（四）：字典、集合、元组

下一篇：TO BE UPDATED..

2.7 类

2.7.1 用isinstance函数检查一个对象的类型

许多新手在接触Python之后会产生一种“Python中没有类型”的错觉。当然Python的对象是有类型的，并且还会发生类型错误。比如，对一个 int 型对象和一个 string 型的对象使用+操作就会产生 TypeError 。如果你在写一些需要基于某些变量类型来做出相应操作的代码的话，那么 isinstance 函数就是你需要的。

isinstance(object, class-or-object-or-tuple)是Python的内建函数，如果第一个object参数和第二个参数，或者其子类型一致，那么返回值为真。如果第二个参数是一个元组的情况，那么当第一个参数的类型是元组中的某一个类型或者其子类型时返回真。需要注意的是尽管在大部分情况下你看到的第二个参数都是内建类型，但是这个函数可以用于任何类型，包括用户创建的类。

// 原书里写的就是class-or-object-or-tuple，其实比较容易混淆，写成class-or-type-or-tuple也许更合适，另外和用type比较的用法比起来，其实两种方法一般情况并无优劣之分，作者这里有些主观了，主要的差别是isinstance会把子类也返回真。比如在Py2里一般的Python字符串和unicode字符串，如果用isinstance比较basestring则会返回真，但是如果用type则可以分辨出他们的区别，根据情况需要才能决定是用isinstance还是type

不良风格：

 1 def get_size(some_object):  2     """Return the "size" of *some_object*, where size = len(some_object) for  3     sequences, size = some_object for integers and floats, and size = 1 for  4     True, False, or None."""  5     try:  6         return len(some_object)  7     except TypeError:  8         if some_object in (True, False, type(None)):  9             return 1 10         else: 11             return int(some_object) 12  13 print(get_size('hello')) 14 print(get_size([1, 2, 3, 4, 5])) 15 print(get_size(10.0))

地道Python：

 1 def get_size(some_object):  2     if isinstance(some_object, (list, dict, str, tuple)):  3         return len(some_object)  4     elif isinstance(some_object, (bool, type(None))):  5         return 1  6     elif isinstance(some_object, (int, float)):  7         return int(some_object)  8   9 print(get_size('hello')) 10 print(get_size([1, 2, 3, 4, 5])) 11 print(get_size(10.0))

2.7.2 使用下划线作为开头命名的变量和函数表明私有性

在Python的一个类中，无论是变量还是函数，都是共有的。用户可以自由地在一个类已经定义之后添加新的属性。除此以外，当继承一个类的时候，因为这种自由性，用户还可能无意中改变基类的属性。最后，虽然所有的变量/属性都可以被访问，但是在逻辑上表明哪些变量是是公有的，哪些是私有或是受保护的还是非常有用的。

所以在Python中有一些被广泛使用的命名上的传统用来表明一个类作者(关于私有性公有性)的意图，比如接下来要介绍的两种用法。对于这两种用法，虽然普遍认为是惯用法，但是事实上在使用中会使编译器也产生不同的行为。

第一个，用单下划线开始命名的表明是受保护的属性，用户不应该直接访问。第二个，用两个连续地下划线开头的属性，表明是私有的，即使子类都不应该访问。当然了，这并不能像其他一些语言中那样真正阻止用户访问到这些属性，但这都是在整个Python社区中被广泛使用的传统，从某种角度上来说这也是Python里用一种办法完成一件事情哲学的体现。

前面曾提到用一个或两个下划线命名的方式不仅仅是传统。一些开发者意识到这种写法是有实际作用的。以单下划线开头的变量在import *时不会被导入。以双下划线开头的变量则会触发Python中的变量名扎压(name mangling)，比如如果Foo是一个类，那么在Foo中定义的一个名字会被展开成_classname__attributename.

不良风格：

 1 class Foo(object):  2     def __init__(self):  3         self.id = 8  4         self.value = self.get_value()  5   6     def get_value(self):  7         pass  8   9     def should_destroy_earth(self): 10         return self.id == 42 11  12 class Baz(Foo): 13     def get_value(self, some_new_parameter): 14         """Since 'get_value' is called from the base class's 15         __init__ method and the base class definition doesn't 16         take a parameter, trying to create a Baz instance will 17         fail 18         """ 19         pass 20  21 class Qux(Foo): 22     """We aren't aware of Foo's internals, and we innocently 23     create an instance attribute named 'id' and set it to 42. 24     This overwrites Foo's id attribute and we inadvertently 25     blow up the earth. 26     """ 27     def __init__(self): 28         super(Qux, self).__init__() 29         self.id = 42 30         # No relation to Foo's id, purely coincidental 31  32 q = Qux() 33 b = Baz() # Raises 'TypeError' 34 q.should_destroy_earth() # returns True 35 q.id == 42 # returns True

地道Python：

 1 class Foo(object):  2     def __init__(self):  3         """Since 'id' is of vital importance to us, we don't  4         want a derived class accidentally overwriting it. We'll  5         prepend with double underscores to introduce name  6         mangling.  7         """  8         self.__id = 8  9         self.value = self.__get_value() # Call our 'private copy' 10  11     def get_value(self): 12         pass 13  14     def should_destroy_earth(self): 15         return self.__id == 42 16  17     # Here, we're storing a 'private copy' of get_value, 18     # and assigning it to '__get_value'. Even if a derived 19     # class overrides get_value in a way incompatible with 20     # ours, we're fine 21     __get_value = get_value 22  23 class Baz(Foo): 24     def get_value(self, some_new_parameter): 25         pass 26  27 class Qux(Foo): 28     def __init__(self): 29         """Now when we set 'id' to 42, it's not the same 'id' 30         that 'should_destroy_earth' is concerned with. In fact, 31         if you inspect a Qux object, you'll find it doesn't 32         have an __id attribute. So we can't mistakenly change 33         Foo's __id attribute even if we wanted to. 34         """ 35         self.id = 42 36         # No relation to Foo's id, purely coincidental 37         super(Qux, self).__init__() 38  39 q = Qux() 40 b = Baz() # Works fine now 41 q.should_destroy_earth() # returns False 42 q.id == 42 # returns True

2.7.3 使用properties来获得更好的兼容性

许多时候提供直接访问类数据的属性会让类更方便使用。比如一个Point类，直接使用x和y的属性回避使用'getter'和'setter'这样的函数更加好用。然而'getters'和'setters'的存在也并不是没有原因的：你并不能确定有的时候某个属性会不会需要(比如在子类中)被某个计算所替代。假设我们有一个Product类，这个类会被产品的名字和价格初始化。我们可以简单地直接设置产品名称和价格的成员变量，然而如果我们在稍后的需求中需要自动计算并将产品的税也加到价格中的话，那么我们就会需要对所有的价格变量进行修改。而避免这样做的办法就是将价格设置为一个属性(property)。

不良风格：

1 class Product(object): 2     def __init__(self, name, price): 3         self.name = name 4         # We could try to apply the tax rate here, but the object's price 5         # may be modified later, which erases the tax 6         self.price = price

地道Python：

 1 class Product(object):  2     def __init__(self, name, price):  3         self.name = name  4         self._price = price  5   6     @property  7     def price(self):  8         # now if we need to change how price is calculated, we can do it  9         # here (or in the "setter" and __init__) 10         return self._price * TAX_RATE 11  12     @price.setter 13     def price(self, value): 14         # The "setter" function must have the same name as the property 15         self._price = value

2.7.4 使用repr生成机器可读的类的表示

在一个类中__str__用来输出对于人可读性好的字符串，__repr__用来输出机器可求值的字符串。Python默认的一个类的__repr__实现没有任何作用，并且要实现一个对所有Python类都有效的默认的__repr__是很困难的。__repr__需要包含所有的用于重建该对象的信息，并且需要尽可能地能够区分两个不同的实例。一个简单地原则是，如果可能的话， eval(repr(instance))==instance 。在进行日志记录的时候__repr__尤其重要，因为日志中打印的信息基本上来说都是来源于__repr__而不是__str__。

不良风格：

 1 class Foo(object):  2     def __init__(self, bar=10, baz=12, cache=None):  3         self.bar = bar  4         self.baz = baz  5         self._cache = cache or {}  6   7     def __str__(self):  8         return 'Bar is {}, Baz is {}'.format(self.bar, self.baz)  9  10 def log_to_console(instance): 11     print(instance) 12  13 log_to_console([Foo(), Foo(cache={'x': 'y'})])

地道Python：

 1 class Foo(object):  2     def __init__(self, bar=10, baz=12, cache=None):  3         self.bar = bar  4         self.baz = baz  5         self._cache = cache or {}  6   7     def __str__(self):  8         return '{}, {}'.format(self.bar, self.baz)  9  10     def __repr__(self): 11         return 'Foo({}, {}, {})'.format(self.bar, self.baz, self._cache) 12  13 def log_to_console(instance): 14     print(instance) 15  16 log_to_console([Foo(), Foo(cache={'x': 'y'})])

2.7.5 使用str生成人可读的类的表示

当定义一个很有可能会被print()用到的类的时候，默认的Python表示就不是那么有用了。定义一个__str__方法可以让print()函数输出想要的信息。

不良风格：

1 class Point(object): 2     def __init__(self, x, y): 3         self.x = x 4         self.y = y 5  6 p = Point(1, 2) 7 print(p) 8  9 # Prints '<__main__.Point object at 0x91ebd0>'

地道Python：

 1 class Point(object):  2     def __init__(self, x, y):  3         self.x = x  4         self.y = y  5   6     def __str__(self):  7         return '{0}, {1}'.format(self.x, self.y)  8   9 p = Point(1, 2) 10 print(p) 11  12 # Prints '1, 2'

2.8 上下文管理器

2.8.1 利用上下文管理器确保资源的合理管理

和C++中的RAII(Resource Acquisition Is Initialization，资源获取就是初始化)原则相似，上下文管理器(和with语句一起使用)可以让资源的管理更加安全和清楚。一个典型的例子是文件IO操作。

首先来看不良风格的代码，如果发生了异常，会怎么样？因为在这个例子中我们并没有抓住异常，所以发生异常后会向上传递，则代码会在无法关闭已打开文件的情况下退出。

标准库中有许多的类支持或使用上下文管理器。除此以外，用户自定义的类也可以通过定义__enter__和__exit__方法来支持上下文管理器。如果是函数，也可以通过contextlib来进行封装。

不良风格：

1 file_handle = open(path_to_file, 'r') 2 for line in file_handle.readlines(): 3     if raise_exception(line): 4         print('No! An Exception!')

地道Python：

1 with open(path_to_file, 'r') as file_handle: 2     for line in file_handle: 3         if raise_exception(line): 4             print('No! An Exception!')

2.9 生成器

2.9.1 对于简单的循环优先使用生成器表达式而不是列表解析

当处理一个序列时，一种很常见的情况是需要每次遍历一个有微小改动的版本的序列。比如，需要打印出所有用户的名字的大写形式。

第一反应当然是用一个即时的表达式实现这种遍历，自然而然地就容易想到列表解析，然而在Python中事实上有更好的内建实现方式：生成器表达式。

那么这两种方式的主要区别在哪里呢？列表解析会产生一个列表对象并且立即产生列表里所有的元素。对于一些大的列表，这通常会带来昂贵的甚至是不可接受的开销。而生成器则返回一个生成器表达式，只有在调用的时候，才产生元素。对于上面提到的例子，也许列表解析还是可以接受的，但是如果我们要打印的不再是大写的名字而是国会图书馆里所有图书的名字的话，产生这个列表可能就已经导致内存溢出了，而生成器表达式则不会这样。

不良风格：

1 for uppercase_name in [name.upper() for name in get_all_usernames()]: 2     process_normalized_username(uppercase_name)

地道Python：

1 for uppercase_name in (name.upper() for name in get_all_usernames()): 2     process_normalized_username(uppercase_name)

2.9.2 使用生成器延迟加载无限的序列

很多情况下，为一个无限长的序列提供一种方式来遍历是非常有用的。否则你会需要提供一个异常昂贵开销的接口来实现，而用户还需要为此等待很长的时间用于生成进行遍历的列表。

面临这些情况，生成器就是理想的选择当元组作为某，来看下面的例子：

不良风格：

 1 def get_twitter_stream_for_keyword(keyword):  2     """Get's the 'live stream', but only at the moment  3     the function is initially called. To get more entries,  4     the client code needs to keep calling  5     'get_twitter_livestream_for_user'. Not ideal.  6     """  7   8     imaginary_twitter_api = ImaginaryTwitterAPI()  9     if imaginary_twitter_api.can_get_stream_data(keyword): 10         return imaginary_twitter_api.get_stream(keyword) 11  12 current_stream = get_twitter_stream_for_keyword('#jeffknupp') 13 for tweet in current_stream: 14     process_tweet(tweet) 15  16 # Uh, I want to keep showing tweets until the program is quit. 17 # What do I do now? Just keep calling 18 # get_twitter_stream_for_keyword? That seems stupid. 19  20 def get_list_of_incredibly_complex_calculation_results(data): 21     return [first_incredibly_long_calculation(data), 22             second_incredibly_long_calculation(data), 23             third_incredibly_long_calculation(data), 24             ]

地道Python：

 1 def get_twitter_stream_for_keyword(keyword):  2     """Now, 'get_twitter_stream_for_keyword' is a generator  3     and will continue to generate Iterable pieces of data  4     one at a time until 'can_get_stream_data(user)' is  5     False (which may be never).  6     """  7   8     imaginary_twitter_api = ImaginaryTwitterAPI()  9     while imaginary_twitter_api.can_get_stream_data(keyword): 10         yield imaginary_twitter_api.get_stream(keyword) 11  12 # Because it's a generator, I can sit in this loop until 13 # the client wants to break out 14 for tweet in get_twitter_stream_for_keyword('#jeffknupp'): 15     if got_stop_signal: 16         break 17     process_tweet(tweet) 18  19 def get_list_of_incredibly_complex_calculation_results(data): 20     """A simple example to be sure, but now when the client 21     code iterates over the call to 22     'get_list_of_incredibly_complex_calculation_results', 23     we only do as much work as necessary to generate the 24     current item. 25     """ 26  27     yield first_incredibly_long_calculation(data) 28     yield second_incredibly_long_calculation(data) 29     yield third_incredibly_long_calculation(data)