LINUX.ORG.RU

История изменений

Исправление Nervous, (текущая версия) :

натянуть мутабельность на жадную итерацию

Действительно слегка быстрее.

;; using manual looping and mutable hashmaps

(defn looping-stats-mutable!
  "Returns the map of station stats keyed by station names.
  Reads a semicolon-separated file with measurement records eagerly
  line by line, parses each line into a measurement (station name and
  measured value) and updates the corresponding station's stats using
  the value. Uses mutable hashmaps to store total and station stats."
  [filename]
  (let [rdr (io/reader filename)]
    (loop [stats (java.util.HashMap.)]
      (if-let [line (.readLine ^java.io.BufferedReader rdr)]
        (let [[key string-value] (s/split line scsv-split-regex)
              value              (parse-double string-value)
              {:keys [count minimum maximum average] :as key-stats}
              (or (.get ^java.util.HashMap stats key)
                  (doto (java.util.HashMap.)
                    (.put :count 0)
                    (.put :maximum value)
                    (.put :minimum value)
                    (.put :average value)))
              updated-key-stats
              (doto ^java.util.HashMap key-stats
                (.put :count (inc count))
                (.put :minimum (min minimum value))
                (.put :maximum (max maximum value))
                (.put :average (moving-average average count value)))]
          (recur (doto stats (.put key updated-key-stats))))
        stats))))

(comment

  ;; ~50% better than baseline (and ~30% better than the transducing variant)

  ;; ~0.7 sec
  (time (format-stats (looping-stats-mutable! "dev/resources/measurements1M.txt")))

  ;; ~750 sec (~12 min)
  (time (format-stats (looping-stats-mutable! "dev/resources/measurements.txt")))

)

Дальше, видимо, уже только изобретать кастомные хэшмапы и читать файл по кускам в несколько потоков.

Исправление Nervous, :

натянуть мутабельность на жадную итерацию

Действительно слегка быстрее.

;; using manual looping and mutable hashmaps

(defn looping-stats-mutable!
  "Returns the map of station stats keyed by station names.
  Reads a semicolon-separated file with measurement records eagerly
  line by line, parses each line into a measurement (station name and
  measured value) and updates the corresponding station's stats using
  the value. Uses mutable hashmaps to store total and station stats."
  [filename]
  (let [rdr (io/reader filename)]
    (loop [stats (java.util.HashMap.)]
      (if-let [line (.readLine ^java.io.BufferedReader rdr)]
        (let [[key string-value] (s/split line scsv-split-regex)
              value              (parse-double string-value)
              {:keys [count minimum maximum average] :as key-stats}
              (or (.get ^java.util.HashMap stats key)
                  (doto (java.util.HashMap.)
                    (.put :count 0)
                    (.put :maximum value)
                    (.put :minimum value)
                    (.put :average value)))
              updated-key-stats
              (doto ^java.util.HashMap key-stats
                (.put :count (inc count))
                (.put :minimum (min minimum value))
                (.put :maximum (max maximum value))
                (.put :average (moving-average average count value)))]
          (recur (doto stats (.put key updated-key-stats))))
        stats))))

(comment

  ;; ~50% better than baseline (and ~30% better than the transducing variant)

  ;; ~0.7 sec
  (time (format-stats-mutable! (looping-stats "dev/resources/measurements1M.txt")))

  ;; ~750 sec (~12 min)
  (time (format-stats-mutable! (looping-stats "dev/resources/measurements.txt")))

)

Дальше, видимо, уже только изобретать кастомные хэшмапы и читать файл по кускам в несколько потоков.

Исходная версия Nervous, :

натянуть мутабельность на жадную итерацию

Действительно слегка быстрее.

;; using manual looping and mutable hashmaps

(defn looping-stats
  "Returns the map of station stats keyed by station names.
  Reads a semicolon-separated file with measurement records eagerly
  line by line, parses each line into a measurement (station name and
  measured value) and updates the corresponding station's stats using
  the value."
  [filename]
  (let [rdr (io/reader filename)]
    (loop [stats (java.util.HashMap.)]
      (if-let [line (.readLine ^java.io.BufferedReader rdr)]
        (let [[key string-value] (s/split line scsv-split-regex)
              value              (parse-double string-value)
              {:keys [count minimum maximum average] :as key-stats}
              (or (.get ^java.util.HashMap stats key)
                  (doto (java.util.HashMap.)
                    (.put :count 0)
                    (.put :maximum value)
                    (.put :minimum value)
                    (.put :average value)))
              updated-key-stats
              (doto ^java.util.HashMap key-stats
                (.put :count (inc count))
                (.put :minimum (min minimum value))
                (.put :maximum (max maximum value))
                (.put :average (moving-average average count value)))]
          (recur (doto stats (.put key updated-key-stats))))
        stats))))

(comment

  ;; ~50% better than baseline (and ~30% better than the transducing variant)

  ;; ~0.7 sec
  (time (format-stats (looping-stats "dev/resources/measurements1M.txt")))

  ;; ~750 sec (~12 min)
  (time (format-stats (looping-stats "dev/resources/measurements.txt")))

)

Дальше, видимо, уже только изобретать кастомные хэшмапы и читать файл по кускам в несколько потоков.