История изменений
Исправление ei-grad, (текущая версия) :
from urllib.parse import urlsplit
import pandas as pd
allurls = ["http://ya.ru/xxx?q=x#zz", "https://google.com/", "https://google.com/zzz"]
df = pd.DataFrame({"url": allurls})
df['protocol'], df['domain'], df['path'], df['query'], df['fragment'] = zip(*df['url'].map(urlsplit))
df.domain.value_counts().sort_values(ascending=False)[:10]
Исправление ei-grad, :
import pandas as pd
from urllib.parse import urlsplit
allurls = ["http://ya.ru/xxx?q=x#zz", "https://google.com/", "https://google.com/zzz"]
df = pd.DataFrame({"url": allurls})
df['protocol'], df['domain'], df['path'], df['query'], df['fragment'] = zip(*df['url'].map(urlsplit))
df.domain.value_counts().sort_values(ascending=False)[:10]
Исходная версия ei-grad, :
import pandas as pd
from urllib.parse import urlparse
allurls = ["http://ya.ru/xxx?q=x#zz", "https://google.com/", "https://google.com/zzz"]
df = pd.DataFrame({"url": allurls})
df['protocol'], df['domain'], df['path'], df['query'], df['fragment'] = zip(*df['url'].map(urlsplit))
df.domain.value_counts().sort_values(ascending=False)[:10]