История изменений

можно попробовать задавать -DGGML_BLAS_VENDOR= или -DGGML_BLAS_VENDOR_DEFAULT.

И какие значения доступны для GGML_BLAS_VENDOR помимо OpenBLAS?

В ggml-blas.cpp в качестве альтернатив cblas упоминаются BLIS, MKL и NVPL. NVPL у меня нет. Если MKL — Intel Math Kernel Library, то в Gentoo оно не считается вариантом BLAS. Попробовал только OpenBLAS и LAPACK.

С LAPACK, который считается эталонной реализацией, не собралось, с BLIS тоже, с OpenBLAS собралось, и 109-секундная речь на английском распозналась за 305 секунд, на CPU без BLAS она же распозналась за 394 секунды.

OpenBLAS:

whisper_print_timings:     load time =   689.73 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   301.09 ms
whisper_print_timings:   sample time =  2219.86 ms /  2370 runs (    0.94 ms per run)
whisper_print_timings:   encode time = 269812.03 ms /     6 runs (44968.67 ms per run)
whisper_print_timings:   decode time =   912.75 ms /    50 runs (   18.26 ms per run)
whisper_print_timings:   batchd time = 23721.02 ms /  2299 runs (   10.32 ms per run)
whisper_print_timings:   prompt time =  6974.69 ms /   801 runs (    8.71 ms per run)
whisper_print_timings:    total time = 304835.28 ms

Без BLAS:

whisper_print_timings:     load time =  4246.12 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   254.14 ms
whisper_print_timings:   sample time =  2184.74 ms /  2359 runs (    0.93 ms per run)
whisper_print_timings:   encode time = 356079.53 ms /     6 runs (59346.59 ms per run)
whisper_print_timings:   decode time =   390.63 ms /    22 runs (   17.76 ms per run)
whisper_print_timings:   batchd time = 24277.35 ms /  2316 runs (   10.48 ms per run)
whisper_print_timings:   prompt time =  6676.17 ms /   770 runs (    8.67 ms per run)
whisper_print_timings:    total time = 394345.44 ms

То есть выигрыш примерно на четверть.

можно попробовать задавать -DGGML_BLAS_VENDOR= или -DGGML_BLAS_VENDOR_DEFAULT.

И какие значения доступны для GGML_BLAS_VENDOR помимо OpenBLAS?

В ggml-blas.cpp в качестве альтернатив cblas упоминаются BLIS, MKL и NVPL. NVPL у меня нет. Если MKL — Intel Math Kernel Library, то в Gentoo оно не считается вариантом BLAS. Попробовал только OpenBLAS и LAPACK.

С LAPACK, который считается эталонной реализацией, не собралось, с OpenBLAS собралось, и 109-секундная речь на английском распозналась за 305 секунд, на CPU без BLAS она же распозналась за 394 секунды.

OpenBLAS:

whisper_print_timings:     load time =   689.73 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   301.09 ms
whisper_print_timings:   sample time =  2219.86 ms /  2370 runs (    0.94 ms per run)
whisper_print_timings:   encode time = 269812.03 ms /     6 runs (44968.67 ms per run)
whisper_print_timings:   decode time =   912.75 ms /    50 runs (   18.26 ms per run)
whisper_print_timings:   batchd time = 23721.02 ms /  2299 runs (   10.32 ms per run)
whisper_print_timings:   prompt time =  6974.69 ms /   801 runs (    8.71 ms per run)
whisper_print_timings:    total time = 304835.28 ms

Без BLAS:

whisper_print_timings:     load time =  4246.12 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   254.14 ms
whisper_print_timings:   sample time =  2184.74 ms /  2359 runs (    0.93 ms per run)
whisper_print_timings:   encode time = 356079.53 ms /     6 runs (59346.59 ms per run)
whisper_print_timings:   decode time =   390.63 ms /    22 runs (   17.76 ms per run)
whisper_print_timings:   batchd time = 24277.35 ms /  2316 runs (   10.48 ms per run)
whisper_print_timings:   prompt time =  6676.17 ms /   770 runs (    8.67 ms per run)
whisper_print_timings:    total time = 394345.44 ms

То есть выигрыш примерно на четверть.

можно попробовать задавать -DGGML_BLAS_VENDOR= или -DGGML_BLAS_VENDOR_DEFAULT.

И какие значения доступны для GGML_BLAS_VENDOR помимо OpenBLAS?

В ggml-blas.cpp в качестве альтернатив cblas упоминаются BLIS, MKL и NVPL. NVPL у меня нет. Если MKL – Intel Math Kernel Library, то в Gentoo оно не считается вариантом BLAS. Попробовал только OpenBLAS и LAPACK.

С LAPACK, который считается эталонной реализацией, не собралось, с OpenBLAS собралось, и 109-секундная речь на английском распозналась за 305 секунд, на CPU без BLAS она же распозналась за 394 секунды.

OpenBLAS:

whisper_print_timings:     load time =   689.73 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   301.09 ms
whisper_print_timings:   sample time =  2219.86 ms /  2370 runs (    0.94 ms per run)
whisper_print_timings:   encode time = 269812.03 ms /     6 runs (44968.67 ms per run)
whisper_print_timings:   decode time =   912.75 ms /    50 runs (   18.26 ms per run)
whisper_print_timings:   batchd time = 23721.02 ms /  2299 runs (   10.32 ms per run)
whisper_print_timings:   prompt time =  6974.69 ms /   801 runs (    8.71 ms per run)
whisper_print_timings:    total time = 304835.28 ms

Без BLAS:

whisper_print_timings:     load time =  4246.12 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   254.14 ms
whisper_print_timings:   sample time =  2184.74 ms /  2359 runs (    0.93 ms per run)
whisper_print_timings:   encode time = 356079.53 ms /     6 runs (59346.59 ms per run)
whisper_print_timings:   decode time =   390.63 ms /    22 runs (   17.76 ms per run)
whisper_print_timings:   batchd time = 24277.35 ms /  2316 runs (   10.48 ms per run)
whisper_print_timings:   prompt time =  6676.17 ms /   770 runs (    8.67 ms per run)
whisper_print_timings:    total time = 394345.44 ms

То есть выигрыш примерно на четверть.