История изменений
Исправление Nervous, (текущая версия) :
натянуть мутабельность на жадную итерацию
Действительно слегка быстрее.
;; using manual looping and mutable hashmaps
(defn looping-stats-mutable!
"Returns the map of station stats keyed by station names.
Reads a semicolon-separated file with measurement records eagerly
line by line, parses each line into a measurement (station name and
measured value) and updates the corresponding station's stats using
the value. Uses mutable hashmaps to store total and station stats."
[filename]
(let [rdr (io/reader filename)]
(loop [stats (java.util.HashMap.)]
(if-let [line (.readLine ^java.io.BufferedReader rdr)]
(let [[key string-value] (s/split line scsv-split-regex)
value (parse-double string-value)
{:keys [count minimum maximum average] :as key-stats}
(or (.get ^java.util.HashMap stats key)
(doto (java.util.HashMap.)
(.put :count 0)
(.put :maximum value)
(.put :minimum value)
(.put :average value)))
updated-key-stats
(doto ^java.util.HashMap key-stats
(.put :count (inc count))
(.put :minimum (min minimum value))
(.put :maximum (max maximum value))
(.put :average (moving-average average count value)))]
(recur (doto stats (.put key updated-key-stats))))
stats))))
(comment
;; ~50% better than baseline (and ~30% better than the transducing variant)
;; ~0.7 sec
(time (format-stats (looping-stats-mutable! "dev/resources/measurements1M.txt")))
;; ~750 sec (~12 min)
(time (format-stats (looping-stats-mutable! "dev/resources/measurements.txt")))
)
Дальше, видимо, уже только изобретать кастомные хэшмапы и читать файл по кускам в несколько потоков.
Исправление Nervous, :
натянуть мутабельность на жадную итерацию
Действительно слегка быстрее.
;; using manual looping and mutable hashmaps
(defn looping-stats-mutable!
"Returns the map of station stats keyed by station names.
Reads a semicolon-separated file with measurement records eagerly
line by line, parses each line into a measurement (station name and
measured value) and updates the corresponding station's stats using
the value. Uses mutable hashmaps to store total and station stats."
[filename]
(let [rdr (io/reader filename)]
(loop [stats (java.util.HashMap.)]
(if-let [line (.readLine ^java.io.BufferedReader rdr)]
(let [[key string-value] (s/split line scsv-split-regex)
value (parse-double string-value)
{:keys [count minimum maximum average] :as key-stats}
(or (.get ^java.util.HashMap stats key)
(doto (java.util.HashMap.)
(.put :count 0)
(.put :maximum value)
(.put :minimum value)
(.put :average value)))
updated-key-stats
(doto ^java.util.HashMap key-stats
(.put :count (inc count))
(.put :minimum (min minimum value))
(.put :maximum (max maximum value))
(.put :average (moving-average average count value)))]
(recur (doto stats (.put key updated-key-stats))))
stats))))
(comment
;; ~50% better than baseline (and ~30% better than the transducing variant)
;; ~0.7 sec
(time (format-stats-mutable! (looping-stats "dev/resources/measurements1M.txt")))
;; ~750 sec (~12 min)
(time (format-stats-mutable! (looping-stats "dev/resources/measurements.txt")))
)
Дальше, видимо, уже только изобретать кастомные хэшмапы и читать файл по кускам в несколько потоков.
Исходная версия Nervous, :
натянуть мутабельность на жадную итерацию
Действительно слегка быстрее.
;; using manual looping and mutable hashmaps
(defn looping-stats
"Returns the map of station stats keyed by station names.
Reads a semicolon-separated file with measurement records eagerly
line by line, parses each line into a measurement (station name and
measured value) and updates the corresponding station's stats using
the value."
[filename]
(let [rdr (io/reader filename)]
(loop [stats (java.util.HashMap.)]
(if-let [line (.readLine ^java.io.BufferedReader rdr)]
(let [[key string-value] (s/split line scsv-split-regex)
value (parse-double string-value)
{:keys [count minimum maximum average] :as key-stats}
(or (.get ^java.util.HashMap stats key)
(doto (java.util.HashMap.)
(.put :count 0)
(.put :maximum value)
(.put :minimum value)
(.put :average value)))
updated-key-stats
(doto ^java.util.HashMap key-stats
(.put :count (inc count))
(.put :minimum (min minimum value))
(.put :maximum (max maximum value))
(.put :average (moving-average average count value)))]
(recur (doto stats (.put key updated-key-stats))))
stats))))
(comment
;; ~50% better than baseline (and ~30% better than the transducing variant)
;; ~0.7 sec
(time (format-stats (looping-stats "dev/resources/measurements1M.txt")))
;; ~750 sec (~12 min)
(time (format-stats (looping-stats "dev/resources/measurements.txt")))
)
Дальше, видимо, уже только изобретать кастомные хэшмапы и читать файл по кускам в несколько потоков.