FastUtil指南

1. 简介

在本教程中，我们将研究FastUtil库。

首先，我们将编写一些特定类型集合的示例。

然后，我们将分析FastUtil名称的由来。

最后，让我们看一下FastUtil的BigArray实用程序。

2. 特点

FastUtil库旨在扩展Java集合框架，它提供类型特定的Map、集合、列表和队列，占用的内存更少，访问和插入速度更快。FastUtil还提供了一组用于处理和操作大型(64位)数组、集合和列表的实用程序。

该库还包括大量用于二进制和文本文件的实用输入/输出类。

其最新版本FastUtil 8还发布了大量特定类型的函数，扩展了JDK的函数接口。

2.1 速度

在许多情况下，FastUtil实现是最快的。作者甚至提供了他们自己的深入基准测试报告，将其与包括HPPC和Trove在内的类似库进行了比较。

在本教程中，我们将使用Java Microbench Harness JMH)定义我们自己的基准。

3. 全大小依赖

除了常见的JUnit依赖之外，我们还将在本教程中使用FastUtils和JMH依赖。

我们的pom.xml文件中需要以下依赖：

<dependency>
    <groupId>it.unimi.dsi</groupId>
    <artifactId>fastutil</artifactId>
    <version>8.2.2</version>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-core</artifactId>
    <version>1.37</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.openjdk.jmh</groupId>
    <artifactId>jmh-generator-annprocess</artifactId>
    <version>1.37</version>
    <scope>test</scope>
</dependency>

或者对于Gradle用户：

testCompile group: 'org.openjdk.jmh', name: 'jmh-core', version: '1.37'
testCompile group: 'org.openjdk.jmh', name: 'jmh-generator-annprocess', version: '1.37'
compile group: 'it.unimi.dsi', name: 'fastutil', version: '8.2.2'

3.1 自定义Jar文件

由于缺少泛型，FastUtils会生成大量特定类型的类。不幸的是，这会导致jar文件非常大。

但是，幸运的是，FastUtils包含一个find-deps.sh脚本，该脚本允许生成更小、更集中的jar，其中仅包含我们想要在应用程序中使用的类。

4. 特定类型的集合

开始之前，我们先快速了解一下实例化特定类型集合的简单过程，我们选择一个使用double存储键和值的HashMap。

为此，FastUtils提供了一个Double2DoubleMap接口和一个Double2DoubleOpenHashMap实现：

Double2DoubleMap d2dMap = new Double2DoubleOpenHashMap();

现在我们已经实例化了我们的类，我们可以像使用Java集合API中的任何Map一样简单地填充数据：

d2dMap.put(2.0, 5.5);
d2dMap.put(3.0, 6.6);

最后，我们可以检查数据是否已正确添加：

assertEquals(5.5, d2dMap.get(2.0));

4.1 性能

FastUtils专注于其高性能实现，在本节中，我们将使用JMH来验证这一事实。让我们将Java Collections HashSet实现与FastUtil的[IntOpenHashSet](http://fastutil.di.unimi.it/docs/it/unimi/dsi/fastutil/ints/IntOpenHashSet.html)进行比较。

首先我们来看看如何实现IntOpenHashSet：

@Param({"100", "1000", "10000", "100000"})
public int setSize;

@Benchmark
public IntSet givenFastUtilsIntSetWithInitialSizeSet_whenPopulated_checkTimeTaken() {
    IntSet intSet = new IntOpenHashSet(setSize);
    for(int i = 0; i < setSize; i++) {
        intSet.add(i);
    }
    return intSet;
}

上面我们只是声明了IntSet接口的IntOpenHashSet实现，我们还用@Param注解声明了初始大小setSize。

简而言之，这些数字被输入到JMH中以产生一系列具有不同集合大小的基准测试。

接下来，让我们使用Java Collections实现做同样的事情：

@Benchmark
public Set<Integer> givenCollectionsHashSetWithInitialSizeSet_whenPopulated_checkTimeTaken() {
    Set<Integer> intSet = new HashSet<>(setSize);
    for(int i = 0; i < setSize; i++) {
        intSet.add(i);
    }
    return intSet;
}

最后，让我们运行基准测试并比较这两种实现：

Benchmark                                     (setSize)  Mode  Cnt     Score   Units
givenCollectionsHashSetWithInitialSizeSet...        100  avgt    2     1.460   us/op
givenCollectionsHashSetWithInitialSizeSet...       1000  avgt    2    12.740   us/op
givenCollectionsHashSetWithInitialSizeSet...      10000  avgt    2   109.803   us/op
givenCollectionsHashSetWithInitialSizeSet...     100000  avgt    2  1870.696   us/op
givenFastUtilsIntSetWithInitialSizeSet...           100  avgt    2     0.369   us/op
givenFastUtilsIntSetWithInitialSizeSet...          1000  avgt    2     2.351   us/op
givenFastUtilsIntSetWithInitialSizeSet...         10000  avgt    2    37.789   us/op
givenFastUtilsIntSetWithInitialSizeSet...        100000  avgt    2   896.467   us/op

这些结果清楚地表明FastUtils实现比Java Collections替代方案性能更高。

5. 大集合

FastUtils的另一个重要特性是能够使用64位数组。默认情况下，Java中的数组限制为32位。

首先，让我们看一下整数类型的BigArrays类。IntBigArrays提供了用于处理二维整数数组的静态方法，通过使用这些提供的方法，我们可以将数组包装成更方便用户的一维数组。

让我们看看它是如何工作的。

首先，我们先初始化一个一维数组，然后使用IntBigArray的wrap方法将其转换为二维数组：

int[] oneDArray = new int[] { 2, 1, 5, 2, 1, 7 };
int[][] twoDArray = IntBigArrays.wrap(oneDArray.clone());

我们应该确保使用clone方法来确保数组的深层复制。

现在，就像我们对List或Map所做的那样，我们可以使用get方法访问元素：

int firstIndex = IntBigArrays.get(twoDArray, 0);
int lastIndex = IntBigArrays.get(twoDArray, IntBigArrays.length(twoDArray)-1);

最后，让我们添加一些检查以确保我们的IntBigArray返回正确的值：

assertEquals(2, firstIndex);
assertEquals(7, lastIndex);

6. 总结

在本文中，我们深入探讨了FastUtils的核心功能。

在使用BigCollections之前，我们研究了FastUtil提供的一些特定类型的集合。

Show Disqus Comments