История изменений

Исправление samson_b, 01.04.20 12:17 (текущая версия) :

вообще-то если функция делает больше двух независимых операций — не важно, сколько они занимают строк, хороший тон распилить её на несколько.

Вот та функция, о которой я говорил. Вы считаете, что правильно будет разделить ее на несколько и разложить по разным файлам?

    def words_hash(text):
        #removing all html tags
        text = re.sub('<.*?>', ' ', text)
        #replacing all non alphabetical characters with spaces
        text = re.sub('[^a-z]', ' ', text.lower())
        #removing all words with less than 3 letters and extra spaces
        normal_text = ' '.join(wrd for wrd in text.split() if len(wrd)>2)
        words = normal_text.split()
        i = len(words)
        n = 0
        hashes = []
        while n <= i-3:
            st = words[n] + words[n+1] + words[n+2]
            hashes.append(int(hashlib.sha256(st.encode('utf-8')).hexdigest(), 16) % 10**19)
            n = n + 1
        hashes.sort()
        return list(dict.fromkeys(hashes))

Исходная версия samson_b, 01.04.20 12:13:

    def words_hash(text):
        #removing all html tags
        text = re.sub('<.*?>', ' ', text)
        #replacing all non alphabetical characters with spaces
        text = re.sub('[^a-z]', ' ', text.lower())
        #removing all words with less than 3 letters and extra spaces
        normal_text = ' '.join(wrd for wrd in text.split() if len(wrd)>2)
        words = normal_text.split()
        i = len(words)
        n = 0
        hashes = []
        while n <= i-3:
            st = words[n] + words[n+1] + words[n+2]
            hashes.append(int(hashlib.sha256(st.encode('utf-8')).hexdigest(), 16) % 10**19)
            n = n + 1
        hashes.sort()
        return list(dict.fromkeys(hashes))