LINUX.ORG.RU

История изменений

Исправление ei-grad, (текущая версия) :

from urllib.parse import urlsplit

import pandas as pd


allurls = ["http://ya.ru/xxx?q=x#zz", "https://google.com/", "https://google.com/zzz"]
df = pd.DataFrame({"url": allurls})
df['protocol'], df['domain'], df['path'], df['query'], df['fragment'] = zip(*df['url'].map(urlsplit))
df.domain.value_counts().sort_values(ascending=False)[:10]

Исправление ei-grad, :

import pandas as pd
from urllib.parse import urlsplit

allurls = ["http://ya.ru/xxx?q=x#zz", "https://google.com/", "https://google.com/zzz"]
df = pd.DataFrame({"url": allurls})
df['protocol'], df['domain'], df['path'], df['query'], df['fragment'] = zip(*df['url'].map(urlsplit))
df.domain.value_counts().sort_values(ascending=False)[:10]

Исходная версия ei-grad, :

import pandas as pd
from urllib.parse import urlparse

allurls = ["http://ya.ru/xxx?q=x#zz", "https://google.com/", "https://google.com/zzz"]
df = pd.DataFrame({"url": allurls})
df['protocol'], df['domain'], df['path'], df['query'], df['fragment'] = zip(*df['url'].map(urlsplit))
df.domain.value_counts().sort_values(ascending=False)[:10]