January 24, 2003
二零零三年一月二十四日
Q:What are the advantages and disadvantagesof implementing deep cloning via Java serialization and a built-in Object.clone()method from a performance point of view?
問(wèn):從性能的角度觀(guān)之, 以 Java serialization(次第讀寫(xiě))或者內建的 Object.clone() 方法(method)來(lái)實(shí)現 deep cloning(深度克?。?,各有哪些優(yōu)劣之處?
A:Equipping classes in your application withcorrect clone() implementation is essential to many defensiveprogramming patterns. Common examples include defensive copying of methodparameters, cloning internal fields before returning them in getters, implementingimmutability patterns, implementing containers with deep cloning semantics,and so on.
答:在您的應用程序中為各個(gè)類(lèi)別搭載正確實(shí)現了的 clone()方法,這對于許多防御式編程模式而言是至關(guān)重要的。常見(jiàn)的防御式編程模式包括:防御式的對方法接收的參數進(jìn)行拷貝;從getters返回內部字段(field)之前先對該內部字段進(jìn)行克??;實(shí)現提供不可變功能的模式(immutability patterns);以 deepcloning(深度克?。┱Z(yǔ)義實(shí)現 containers(容器);等等。
Even though the question mentions just two possibilities, there are at leastfour distinct approaches to clone() implementation.In this Java Q&A installment, I consider design and performancetradeoffs involved in all of them.
盡管問(wèn)題中只提到了兩個(gè)可能的方案,其實(shí)至少有四種不同的方案來(lái)實(shí)現 clone() 方法。在本期的Java 問(wèn)答中,我就針對這四種方案來(lái)進(jìn)行設計和性能兩方面的權衡。
Because cloning is so customizable, this article‘s examples will not necessarilytranslate directly to your own code; however, the general conclusions we willreach should provide useful guidelines in any application design.
由于克隆實(shí)現代碼的可定制性很強,因此本文的示例代碼不一定就適合直接轉化到您自己的代碼中;然而,我們得出的普適結論應該能為任何應用設計提供有用的指導。
Note the following disclaimer: What exactly constitutes a deep clone isdebatable. Even though two objects can safely share the same Stringreference viewed as data, they cannot if the same field is used as an instance-scopedobject monitor (such as when calling Object.wait()/notify() onit) or if field instance identity (such as when using the ==operator) is significant to the design. In the end, whether or not a fieldis shareable depends on the class design. For simplicity, I assume below thatall fields are used as pure data.
請注意這面這句不作承諾的聲明:deep clone(深度克?。┚烤褂心男┚唧w的實(shí)現要素,這個(gè)問(wèn)題本身就具有爭議性。盡管一個(gè)被視為數據的“String 引用”可以被兩個(gè)對象安全的共享,但如果該 String 字段是被用作實(shí)體生存空間范圍內(instance-scoped)的對象監視器(object monitor,比如對其調用 Object.wait()/notify() 的情形),或者字段實(shí)體的身份(identity)對于設計而言至關(guān)重要(比如使用 == operator 的情形),那么它就無(wú)法被安全的共享了。一言以蔽之,字段是否可被共享取決于類(lèi)別的設計。為了簡(jiǎn)單起見(jiàn),我假設本文所述的所有字段都被視為純粹的數據來(lái)使用。
Performance measurements setup
用于性能度量的范例設定
Let‘s jump right into some code. I use the following simple hierarchy of classesas my cloning guinea pig:
讓我們直接來(lái)看些代碼。我使用如下簡(jiǎn)單的類(lèi)別階層體系來(lái)作為克隆“實(shí)驗鼠”:
public class TestBaseClass
implementsCloneable, Serializable
{
public TestBaseClass (String dummy)
{
m_byte = (byte) 1;
m_short = (short) 2;
m_long = 3L;
m_float = 4.0F;
m_double = 5.0;
m_char = ‘6‘;
m_boolean = true;
m_int = 16;
m_string = "some string inTestBaseClass";
m_ints = new int [m_int];
for (int i = 0; i < m_ints.length;++ i) m_ints [i] = m_int;
m_strings = new String [m_int];
m_strings [0] = m_string;// invariant: m_strings [0] == m_string
for (int i = 1; i < m_strings.length;++ i)
m_strings[i] = new String (m_string);
}
public TestBaseClass (final TestBaseClass obj)
{
if (obj == null) throw newIllegalArgumentException ("null input: obj");
// Copy all fields:
m_byte = obj.m_byte;
m_short = obj.m_short;
m_long = obj.m_long;
m_float = obj.m_float;
m_double = obj.m_double;
m_char = obj.m_char;
m_boolean = obj.m_boolean;
m_int = obj.m_int;
m_string = obj.m_string;
if (obj.m_ints != null) m_ints= (int []) obj.m_ints.clone ();
if (obj.m_strings != null)m_strings = (String []) obj.m_strings.clone ();
}
// Cloneable:
public Object clone ()
{
if (Main.OBJECT_CLONE)
{
try
{
//Chain shallow field work to Object.clone():
finalTestBaseClass clone = (TestBaseClass) super.clone ();
//Set deep fields:
if(m_ints != null)
clone.m_ints= (int []) m_ints.clone ();
if(m_strings != null)
clone.m_strings= (String []) m_strings.clone ();
returnclone;
}
catch(CloneNotSupportedException e)
{
thrownew InternalError (e.toString ());
}
}
else if (Main.COPY_CONSTRUCTOR)
returnnew TestBaseClass (this);
else if (Main.SERIALIZATION)
returnSerializableClone.clone (this);
else if (Main.REFLECTION)
returnReflectiveClone.clone (this);
else
thrownew RuntimeException ("select cloning method");
}
protected TestBaseClass () {} // accessible to subclassesonly
private byte m_byte;
private short m_short;
private long m_long;
private float m_float;
private double m_double;
private char m_char;
private boolean m_boolean;
private int m_int;
private int [] m_ints;
private String m_string;
private String [] m_strings; // invariant: m_strings[0] == m_string
} // end of class
public final class TestClass extends TestBaseClass
implementsCloneable, Serializable
{
public TestClass (String dummy)
{
super (dummy);
m_int = 4;
m_object1 = new TestBaseClass(dummy);
m_object2 = m_object1; //invariant: m_object1 == m_object2
m_objects = new Object [m_int];
for (int i = 0; i < m_objects.length;++ i)
m_objects[i] = new TestBaseClass (dummy);
}
public TestClass (final TestClass obj)
{
// Chain to super copy constructor:
super (obj);
// Copy all fields declaredby this class:
m_int = obj.m_int;
if (obj.m_object1 != null)
m_object1= ((TestBaseClass) obj.m_object1).clone ();
m_object2 = m_object1; //preserve the invariant
if (obj.m_objects != null)
{
m_objects= new Object [obj.m_objects.length];
for(int i = 0; i < m_objects.length; ++ i)
m_objects[i] = ((TestBaseClass) obj.m_objects [i]).clone ();
}
}
// Cloneable:
public Object clone ()
{
if (Main.OBJECT_CLONE)
{
//Chain shallow field work to Object.clone():
finalTestClass clone = (TestClass) super.clone ();
//Set only deep fields declared by this class:
if(m_object1 != null)
clone.m_object1= ((TestBaseClass) m_object1).clone ();
clone.m_object2= clone.m_object1; // preserve the invariant
if(m_objects != null)
{
clone.m_objects= (Object []) m_objects.clone ();
for(int i = 0; i < m_objects.length; ++ i)
clone.m_objects[i] = ((TestBaseClass) m_objects [i]).clone ();
}
returnclone;
}
else if (Main.COPY_CONSTRUCTOR)
returnnew TestClass (this);
else if (Main.SERIALIZATION)
returnSerializableClone.clone (this);
else if (Main.REFLECTION)
returnReflectiveClone.clone (this);
else
thrownew RuntimeException ("select cloning method");
}
protected TestClass () {} // accessible to subclassesonly
private int m_int;
private Object m_object1, m_object2; // invariant:m_object1 == m_object2
private Object [] m_objects;
} // End of class
TestBaseClass has several fields of primitive types as wellas a String and a couple of array fields. TestClassboth extends TestBaseClass and aggregates several instances ofit. This setup allows us to see how inheritance, member object ownership,and data types can affect cloning design and performance.
TestBaseClass 擁有幾個(gè)基本型別(primitive types)的字段(fields),還有一個(gè) String 以及兩個(gè)數組。 TestClass 繼承自 TestBaseClass ,還聚合了幾個(gè) TestBaseClass 實(shí)體。這種范例設定可以讓我們看到繼承、成員對象所有權(ownership)以及數據類(lèi)型如何會(huì )影響克隆方法的設計與性能。
In a previousJava Q&A article, I developed a simple timing librarythat comes in handy now. This code in class Main measures thecost of TestClass.clone():
在 上一期Java 問(wèn)答 中,我開(kāi)發(fā)了一個(gè)簡(jiǎn)單的計時(shí)程序庫,現在可以信手拈來(lái)使用。在 class Main 中的如下代碼測量了 TestClass.clone() 的時(shí)間消耗:
// Create an ITimer:
final ITimer timer = TimerFactory.newTimer();
// JIT/hotspot warmup:
// ...
TestClass obj = new TestClass();
// Warm up clone():
// ...
final int repeats = 1000;
timer.start ();
// Note: the loop is unrolled10 times
for (int i = 0; i < repeats/ 10; ++ i)
{
obj= (TestClass) obj.clone ();
...repeated 10 times ...
}
timer.stop ();
final DecimalFormat format= new DecimalFormat ();
format.setMinimumFractionDigits(3);
format.setMaximumFractionDigits(3);
System.out.println ("methodduration: " +
format.format(timer.getDuration () / repeats) + " ms");
I use the high-resolution timer supplied by TimerFactory witha loop that creates a moderate number of cloned objects. The elapsed timereading is reliable, and there is little interference from the garbage collector.Note how the obj variable continuously updates to avoid memorycaching effects.
我使用了由 TimerFactory 提供的高解析度的計時(shí)器(high-resolution timer),利用一個(gè)循環(huán)創(chuàng )建了相當數量的克隆出來(lái)的對象。表示流逝時(shí)間的數據是可靠的,受垃圾收集器的影響很小。請注意 obj 變量被持續更新,以避免內存緩沖效應(memory caching effects)。
Also note how clone() is implemented in both classes. The implementationin each class is in fact four, selected one at a time using four conditionalcompilation constants in Main: OBJECT_CLONE, COPY_CONSTRUCTOR,SERIALIZATION, and REFLECTION. Recompile the entireobject when changing the cloning approach.
還請注意,在兩個(gè)類(lèi)別中都實(shí)現了 clone() 方法。實(shí)際上每個(gè)類(lèi)別中都有四種克隆動(dòng)作的實(shí)現,可以通過(guò) Main 里面的條件編譯常量(conditional compilation constants)來(lái)選擇施行其中之一,這些常量分別是: OBJECT_CLONE,COPY_CONSTRUCTOR,SERIALIZATION 以及 REFLECTION 。要改變克隆動(dòng)作的實(shí)現方案,需要重新編譯整個(gè)類(lèi)別。
Let‘s now examine each approach in detail.
現在我們就分別詳細的審視前述的四個(gè)方案。
Approach 1: Cloning by chaining to Object.clone()
方案 1:通過(guò)串鏈 Object.clone() 實(shí)現克隆
This is perhaps the most classical approach. The steps involved are:
這或許就是最經(jīng)典型的方案了。該方案涉及的實(shí)現步驟為:
Cloneable marker interface.Cloneable 標記接口(marker interface)。
clone override that always begins with a call to super.clone() followed by manual copying of all deep fields (i.e., mutable fields that are object references and cannot be shared between several instances of the parent class).clone 方法,其內以調用 super.clone() 開(kāi)頭,后面再接續拷貝所有深層字段(deep fields,即為對象引用,且不能共享于多個(gè)父輩類(lèi)別實(shí)體之間的可變字段(mutable fields))的代碼。
clone override not to throw any exceptions, including CloneNotSupportedException. To this effect, the clone() method in your hierarchy‘s first class that subclasses a non-Cloneable class will catch CloneNotSupportedException and wrap it into an InternalError.clone 方法不拋出任何異常,包括不能拋出 CloneNotSupportedException 異常。 意思就是說(shuō):在您的類(lèi)別階層體系中,對于第一個(gè)派生自 non-Cloneable 類(lèi)別的那個(gè)類(lèi)別,其 clone() 方法能夠捕獲 CloneNotSupportedException 異常并將該異常包入 InternalError 中。 Correct implementation of Cloneable easily deserves a separatearticle. Because my focus is on measuring performance, I will repeat the relevantpoints here and direct readers to existing references for further details(see Resources).
光是 Cloneable 的正確實(shí)現方法就可以很容易的需要占用另外一整篇文章的篇幅來(lái)進(jìn)行闡述。鑒于我在這里關(guān)注的是性能的測量,因而我也就只復述一些相關(guān)的要點(diǎn),并為讀者您提供更多細節的參考信息(詳見(jiàn)參考資源)。
This traditional approach is particularly well suited to the presence ofinheritance because the chain of super.clone() eventually callsthe native java.lang.Object.clone() implementation. This is goodfor two reasons. First, this native method has the magic ability to alwayscreate an instance of the most derived class for the current object. Thatis, the result of super.clone() in TestBaseClassis an instance of TestClass when TestBaseClass.clone()is part of the chain of methods originating from TestClass.clone().This makes it easy to implement the desirable x.clone().getClass() ==x.getClass() invariant even in the presence of inheritance.
這個(gè)經(jīng)典型的方案特別適用于有繼承體系的地方,因為 super.clone() 串鏈最終會(huì )導致調用原生的 java.lang.Object.clone() 方法。說(shuō)這樣做很妥當有兩個(gè)原因。其一,該原生方法(native method)具有神奇的能力,總是能夠為當前對象創(chuàng )建繼承體系最末端的類(lèi)別實(shí)體。這就是說(shuō),TestBaseClass 中 super.clone() 的執行結果得到 TestClass 實(shí)體,因為 TestBaseClass.clone() 是起源自 TestClass.clone() 的一系列串鏈起來(lái)的方法之一。這樣一來(lái),即使是在繼承體系之中也很容易實(shí)現我們想要的 x.clone().getClass() == x.getClass() 不變式(invariant)。
Second, if you examine the JVM sources, you will see that at the heart ofjava.lang.Object.clone() is the memcpy C function,usually implemented in very efficient assembly on a given platform; so I expectthe method to act as a fast "bit-blasting" shallow clone implementation, replicatingall shallow fields in one fell swoop. In many cases, the only remaining manualcoding is done to deeply clone object reference fields that point to unshareablemutable objects.
其二,如果您查看JVM源代碼的話(huà),您會(huì )看到 java.lang.Object.clone() 的核心部分是C函數 memcpy ,這個(gè)函數是用目標平臺上非常高效的匯編代碼實(shí)現的;因此可以期望這個(gè) java.lang.Object.clone()方法的實(shí)現是以快速的“按比特狂做(bit-blasting)”之方式進(jìn)行的淺度克?。╯hallowclone),能夠迅捷的復制所有淺層字段(shallowfields)。這樣一來(lái)在許多情況下,所剩的唯一需要手工編寫(xiě)的代碼就只用負責對“指向非共享、可易變對象(unshareable mutableobjects)之引用”進(jìn)行深度克隆。
Running the test with the OBJECT_CLONE variable set to trueon a Windows 550-MHz machine with Sun Microsystems‘ JDK 1.4.1 produces:
將 OBJECT_CLONE 變量設為 true ,在一臺安裝了 Sun Microsystems JDK 1.4.1 的 Windows 550-MHz 機器上面運行測試程序就產(chǎn)生出如下結果:
clone implementation: Object.clone()
method duration: 0.033 ms
This is not bad for a class with multiple primitive and object referencefields. But for better insight, I must compare the result with other approachesbelow.
對于擁有多個(gè)基本型別字段和對象引用字段的類(lèi)別而言,這不算壞。然而為了更好的考究問(wèn)題,我須將此結果與下面其它方案進(jìn)行比較才對。
Despite its advantages, this approach is plagued with problems due to poorjava.lang.Object.clone() design. It cannot be used for cloningfinal fields unless they can be copied shallowly. Creating smart, deeply cloningcontainer classes is complicated by the fact that Cloneable isjust a marker interface, and java.lang.Object.clone() is notpublic. Finally, cloning inner classes does not work due to problems withouter references. See articles by Mark Davis and Steve Ball in Resourcesfor some of the earliest discussions about this topic.
盡管該方案有自己的優(yōu)勢,但設計欠佳的 java.lang.Object.clone() 方法使其備受折磨。除非 final 字段能被淺層拷貝,否則該方案就不能用于對 final 字段進(jìn)行克隆的情形。由于 Cloneable 只是一個(gè)標記接口(marker interface),而 java.lang.Object.clone()方法又不是 public ,因此創(chuàng )建既聰明又具有 deeply cloning(深度克?。┠芰Φ?containerclasses(容器類(lèi)別)變得復雜起來(lái)。最后,由于外圍引用(outerreferences)亦招致問(wèn)題,因此該方案也無(wú)法運用于克隆內隱類(lèi)別(inner classes)的情形。關(guān)于此議題的最早的討論,參見(jiàn) 參考資源 中 Mark Davis 和 Steve Ball 的文章。
Approach 2: Cloning via copy construction
方案 2: 通過(guò)拷貝構造動(dòng)作進(jìn)行克隆
This approach complements Approach 1. It involves these steps:
這是對方案1的增強補足方案,實(shí)現起來(lái)包含下列步驟:
X, provide a copy constructor with signature X(X x). X ,以標記式(signature) X(X x) 來(lái)提供一個(gè) copy constructor 。
clone() or directly to the base copy constructor. The former choice is more polymorphic and works when the base copy constructor is private, and the latter sometimes avoids the small cost of casting clone()‘s return value to a specific type. clone() 方法中,或者直接串鏈到它們的基類(lèi)的拷貝構造函數(copy constructor)中。前一種做法更具多態(tài)特性,在基類(lèi)的拷貝構造函數(copy constructor)為private時(shí)即可湊效;后一種做法有時(shí)候能夠避免“將 clone() 方法的返回值轉型(cast)到某個(gè)特定型別”所帶來(lái)的微小性能消耗。
Setting COPY_CONSTRUCTOR to true and rerunning the test produces:
將 COPY_CONSTRUCTOR 設為 true ,再重新運行測試程序,產(chǎn)生如下結果:
clone implementation: copy construction
method duration: 0.024 ms
This beats Approach 1. The result might not be surprising because the overhead of native method calls has increased and the cost of new object creation has decreased with increasing JDK versions. If you rerun the same tests in Sun‘s JDK 1.2.2, the situation favors Approach 1. Of course, performance depends on the relative mix of shallow and deep fields in the class hierarchy. Classes with many primitive type fields benefit more from Approach 1. Classes with a few mostly immutable fields work very efficiently with Approach 2, with a speed advantage at least 10 times greater than Approach 1.
這次的結果意味方案2勝過(guò)方案1?;蛟S這結果并不令人吃驚,因為增加了對原生方法的調用,而創(chuàng )建新對象的消耗伴隨著(zhù) JDK 版本的升高而減小。如果您在 Sun 公司的 JDK 1.2.2 之下重新運行相同的測試,方案1就會(huì )勝出。當然,性能依賴(lài)于類(lèi)別階層體系中淺層字段(shallow fields)和深層字段(deep fields)的混雜方式。擁有很多基本型別之字段的類(lèi)別會(huì )更多的得益于方案1。而對于只擁有少數字段且多為不可變字段的類(lèi)別,方案2運作得非常高效,其 速度上的優(yōu)勢至少為快過(guò)方案1十倍。
Approach 2 is more error prone than Approach 1 because it is easy to forget to override clone() and accidentally inherit a superclass‘s version that will return an object of the wrong type. If you make the same mistake in Approach 1, the result will be less disastrous. Additionally, it is harder to maintain the implementation in Approach 2 when fields are added and removed (compare the OBJECT_CLONE branch in TestBaseClass.clone() with similar code in the copy constructor). Also, Approach 1 requires less class cooperation in some cases: for a base class with only shallow fields, you don‘t need to implement Cloneable or even provide a clone() override if you do not intend to clone at the base class level.
方案2比方案1更容易出錯,因為很容易忘記覆寫(xiě)(override) clone() 方法,并由此意外的繼承了父輩類(lèi)別(superclass)的 clone() 版本,其返回一個(gè)錯誤型別的對象。但若您在方案1中犯下同樣的錯誤,后果就不會(huì )那么慘重。另外,當類(lèi)別的字段被添加或者刪除時(shí),方案2的實(shí)現代碼更難于維護(將 TestBaseClass.clone() 中的 OBJECT_CLONE 分支與拷貝構造函數中的相應代碼進(jìn)行比較即可知)。再有就是,方案1在某些情況下對類(lèi)別之間的合作需求更少:對于只擁有淺層字段的基類(lèi),您不需要實(shí)現 Cloneable 方法;如果您無(wú)意在基類(lèi)的層級上進(jìn)行克隆動(dòng)作,您甚至不需要提供覆寫(xiě)版本的 clone() 方法。
However, an undeniable advantage of cloning via copy construction is that it can handle both final fields and inner classes. But due to dangers present when inheritance is involved, I recommend using this sparingly and preferably simultaneously with making the relevant classes final.
然而,通過(guò)拷貝構造動(dòng)作進(jìn)行克?。ㄗg注:即方案2)有個(gè)不可否認的優(yōu)勢,此即:該方案既可以處理 final 字段,也可以處理內隱類(lèi)別(inner classes)。鑒于該方案在涉及繼承時(shí)所具有的危險性,我建議保守的采用之,且采用該方案時(shí)最好同時(shí)將有關(guān)的類(lèi)別聲明為final 。
Approach 3: Cloning via Java serialization
方案 3:通過(guò) Java serialization(次第讀寫(xiě))進(jìn)行克隆
Java serialization is convenient. Many classes are made serializable by simply declaring them to implement java.io.Serializable. Thus, a whole hierarchy of classes can be made cloneable by deriving them from a base Serializable class whose clone() is implemented as a simple, yet extremely generic method:
Java serialization(次第讀寫(xiě))方便好用。許多類(lèi)別只要被簡(jiǎn)單的聲明為“實(shí)現 java.io.Serializable” 就能具備 serializable 性質(zhì)。于是,若令整個(gè)階層體系派生自基類(lèi) Serializable ,那么階層體系的所有類(lèi)別就都能具備 cloneable 性質(zhì),欲使然只要求基類(lèi) Serializable 實(shí)現出一個(gè)簡(jiǎn)單,同時(shí)又極為通用的 clone() 方法:
public Object clone (Object obj)
{
try
{
ByteArrayOutputStream out = new ByteArrayOutputStream ();
ObjectOutputStream oout = new ObjectOutputStream (out);
oout.writeObject (obj);
ObjectInputStream in = new ObjectInputStream (
new ByteArrayInputStream (out.toByteArray ()));
return in.readObject ();
}
catch (Exception e)
{
throw new RuntimeException ("cannot clone class [" +
obj.getClass ().getName () + "] via serialization: " +
e.toString ());
}
}
This is so generic it can be used for cloning classes that can be written and added to your application by someone else long after you provide the base classes. But this convenience comes at a price. After switching TestBaseClass.clone() and TestClass.clone() to the SERIALIZATION branch I get:
這個(gè)實(shí)現是如此之通用,在您寫(xiě)好基類(lèi)很久以后,別人要將新編寫(xiě)的類(lèi)別加入您的應用程序時(shí),還可以利用該方法來(lái)克隆那些新編寫(xiě)的類(lèi)別。然而這種便利性得來(lái)有代價(jià)。將 TestBaseClass.clone() 和 TestClass.clone() 之實(shí)現代碼切換到 SERIALIZATION 分支的情況下,我得到如下的結果:
clone implementation: serialization
method duration: 2.724 ms
This is roughly 100 times slower than Approaches 1 and 2. You probably would not want this option for defensive cloning of parameters of otherwise fast intra-JVM methods. Even though this method can be used for generic containers with deep cloning semantics, cloning a few hundred objects would make you see times in the one-second range: a doubtful prospect.
這比方案1和方案2慢了有100倍左右。如果您是在為本該很快的 intra-JVM 之 方法的參數作防御性的克隆,您大概不會(huì )希望采用這種方案。盡管該方法可被運用于帶有深度克隆語(yǔ)義的通用containers(容器),但像這樣克隆幾百個(gè) 對象的話(huà),您會(huì )得到1秒鐘范圍內的時(shí)間消耗——其應用前景令人生疑。
There are several reasons why this approach is so slow. Serialization depends on reflective discovery of class metadata, known to be much slower than normal method calls. Furthermore, because a temporary input/output (I/0) stream is used to flatten the entire object, the process involves UTF (Universal Transformation Format) 8-encoding and writing out every character of, say, TestBaseClass.m_string. Compared to that, Approaches 1 and 2 only copy String references; each copy step has the same small fixed cost.
該方案如此緩慢有幾個(gè)原因。首先,serialization(次第讀寫(xiě))機制系依靠類(lèi)別元數據(metadata)的映像式探知動(dòng)作 (reflective discovery),已知它比普通的函數調用慢得多。更為甚之,由于serialization(次第讀寫(xiě))使用一個(gè)臨時(shí)的 輸入/輸出(I/0)串流(stream)來(lái)攤開(kāi)(flatten)整個(gè)對象,因而整個(gè)過(guò)程涉及到 UTF8 編碼動(dòng)作(UTF8-encoding,Universal Transformation Format)以及向外寫(xiě)入被攤開(kāi)的對象成分的每個(gè)字符(比如 TestBaseClass.m_string)。相比之下(再以 TestBaseClass.m_string 為例),方案1和方案2只需要拷貝 String 引用,且每次拷貝具有相同的固定的時(shí)間消耗。
What‘s even worse, ObjectOutputStream and ObjectInputStream perform a lot of unnecessary work. For example, writing out class metadata (class name, field names, metadata checksum, etc.) that may need to be reconciled with a different class version on the receiving end is pure overhead when you serialize a class within the same ClassLoader namespace.
更糟糕的是,ObjectOutputStream 和 ObjectInputStream 做了諸多不必要的工作。例如向外寫(xiě)入類(lèi)別元數據(metadata,這包括類(lèi)別名稱(chēng)、字段名稱(chēng)、元數據校驗和,等等),只為與寫(xiě)入操作之接收端的不同版本類(lèi)別相配合,而這對于您在同一個(gè) ClassLoader 命名空間(namespace)里面次第讀寫(xiě)(serialize)類(lèi)別的情況下,純粹就是額外負荷。
On the plus side, serialization imposes fairly light constructor requirements (the first non-Serializable superclass must have an accessible no-arg constructor) and correctly handles final fields and inner classes. This is because native code constructs the clone and populates its fields without using any constructors (something that can‘t be done in pure Java).
從好的一面來(lái)說(shuō),次第讀寫(xiě)(serialization)對構造函數的特定需求相當?。ǖ谝粋€(gè) non-Serializable 基類(lèi)必須擁有一個(gè)可訪(fǎng)問(wèn)的無(wú)參數構造函數),并能正確妥當的處理final字段和內隱類(lèi)別的情形。這是因為原生代碼能在不使用構造函數的情況下構造克隆對象并轉存(populates)對象的字段(這是單純依靠Java所無(wú)法做到的)。
One more interesting advantage of Approach 3 is that it can preserve the structure of object graph rooted at the source object. Examine the dummy TestBaseClass constructor. It fills the entire m_strings array with the same m_string reference. Without any special effort on our part, the invariant m_strings[0] == m_string is preserved in the cloned object. In Approaches 1 and 2, the same effect is either purely incidental (such as when immutable objects remain shared by reference) or requires explicit coding (as with m_object1 and m_object2 in TestClass). The latter could be hard to get right in general, especially when object identities are established at runtime and not compile time (as is the case with TestClass).
方案3還有一個(gè)優(yōu)勢:它可以保持根基于次第讀寫(xiě)源對象的“對象圖面(object graph)”結構。來(lái)觀(guān)察一下 dummy TestBaseClass 構造函數。該構造函數以相同的 m_string 引用填充整個(gè) m_strings 數組。在我們的代碼中,不用借助任何特殊動(dòng)作就可以在克隆出來(lái)的對象內保持 m_strings[0] == m_string 不變式(invariant)。而要在方案1和方案2中達到同樣的效果,則要么純粹靠巧合(比如不可變對象通過(guò)引用保持被共享),要么就需要額外的編碼(如同 TestClass 中 m_object1 和 m_object2 的情形)。要把后一種情況做到正確無(wú)誤通常是困難的,特別是在對象的身份在運行期(而非編譯期)才建立之情形下(如 TestClass 中的情形)。
Approach 4: Cloning via Java reflection
方案 4:通過(guò) Java reflection(映像)進(jìn)行克隆
Approach 4 draws inspiration from Approach 3. Anything that uses reflection can work on a variety of classes in a generic way. If I require the class in question to have a (not necessarily public) no-arg constructor, I can easily create an empty instance using reflection. It is especially efficient when the no-arg constructor doesn‘t do anything. Then it is a straightforward matter to walk the class‘s inheritance chain all the way to Object.class and set all (not just public) declared instance fields for each superclass in the chain. For each field, I check whether it contains a primitive value, an immutable object reference, or an object reference that needs to be cloned recursively. The idea is straightforward but getting it to work well requires handling a few details. My full demo implementation is in class ReflectiveClone, available as a separate download. Here is the pseudo-code of the full implementation, with some details and all error handling omitted for simplicity:
方案4從方案3吸取了一些要領(lǐng)。針對各種類(lèi)別,任何動(dòng)用映像(reflection)者都能以通用的方式處理之。如果我希望手中的類(lèi)別能擁有一個(gè)無(wú)參數構 造函數(并非需要為 public),我用映像(reflection)簡(jiǎn)單的創(chuàng )建一個(gè)空白實(shí)體即可。在無(wú)參數構造函數并不做任何事情的情況下,使用映像 (reflection)就特別有效率。于是,我們可以直截了當的走遍類(lèi)別的繼承鏈,一路直至 Object.class ,并在其間為繼承鏈中每一個(gè)基類(lèi)設置所有聲明的實(shí)體字段(不僅只含 public 的字段)。我針對其中每一個(gè)字段做檢查,看其包含的是否為:基本型別的值,或不可變對象之引用,或是需要被遞歸克隆的對象引用。整個(gè)想法是直截了當的,但 欲令其正確運作,我們需要處理幾個(gè)細節。我撰寫(xiě)的完整范例實(shí)現在 ReflectiveClone 類(lèi)別中,被作為一個(gè)單獨的 下載 供您查看。該完整實(shí)現的偽碼如下,為了簡(jiǎn)單起見(jiàn)忽略了某些細節以及所有錯誤處理:
public abstract class ReflectiveClone
{
/**
* Makes a reflection-based deep clone of ‘obj‘. This method is mutually
* recursive with {@link #setFields}.
*
* @param obj current source object being cloned
* @return obj‘s deep clone [never null; can be == to ‘obj‘]
*/
public static Object clone (final Object obj)
{
final Class objClass = obj.getClass ();
final Object result;
if (objClass.isArray ())
{
final int arrayLength = Array.getLength (obj);
if (arrayLength == 0) // empty arrays are immutable
return obj;
else
{
final Class componentType = objClass.getComponentType ();
// Even though arrays implicitly have a public clone(), it
// cannot be invoked reflectively, so need to do copy construction:
result = Array.newInstance (componentType, arrayLength);
if (componentType.isPrimitive () ||
FINAL_IMMUTABLE_CLASSES.contains (componentType))
{
System.arraycopy (obj, 0, result, 0, arrayLength);
}
else
{
for (int i = 0; i < arrayLength; ++ i)
{
// Recursively clone each array slot:
final Object slot = Array.get (obj, i);
if (slot != null)
{
final Object slotClone = clone (slot);
Array.set (result, i, slotClone);
}
}
}
return result;
}
}
else if (FINAL_IMMUTABLE_CLASSES.contains (objClass))
{
return obj;
}
// Fall through to reflectively populating an instance created
// via a no-arg constructor:
// clone = objClass.newInstance () can‘t handle private constructors:
Constructor noarg = objClass.getDeclaredConstructor (EMPTY_CLASS_ARRAY);
if ((Modifier.PUBLIC & noarg.getModifiers ()) == 0)
{
noarg.setAccessible (true);
}
result = noarg.newInstance (EMPTY_OBJECT_ARRAY);
for (Class c = objClass; c != Object.class; c = c.getSuperclass ())
{
setFields (obj, result, c.getDeclaredFields ());
}
return result;
}
/**
* This method copies all declared ‘fields‘ from ‘src‘ to ‘dest‘.
*
* @param src source object
* @param dest src‘s clone [not fully populated yet]
* @param fields fields to be populated
*/
private static void setFields (final Object src, final Object dest,
final Field [] fields)
{
for (int f = 0, fieldsLength = fields.length; f < fieldsLength; ++ f)
{
final Field field = fields [f];
final int modifiers = field.getModifiers ();
if ((Modifier.STATIC & modifiers) != 0) continue;
// Can also skip transient fields here if you want reflective
// cloning to be more like serialization.
if ((Modifier.FINAL & modifiers) != 0)
throw new RuntimeException ("cannot set final field" +
field.getName () + " of class " + src.getClass ().getName ());
if ((Modifier.PUBLIC & modifiers) == 0) field.setAccessible (true);
Object value = field.get (src);
if (value == null)
field.set (dest, null);
else
{
final Class valueType = value.getClass ();
if (! valueType.isPrimitive () &&
! FINAL_IMMUTABLE_CLASSES.contains (valueType))
{
// Value is an object reference, and it could be either an
// array or of some mutable type: try to clone it deeply
// to be on the safe side.
value = clone (value);
}
field.set (dest, value);
}
}
}
private static final Set FINAL_IMMUTABLE_CLASSES; // Set in <clinit>
private static final Object [] EMPTY_OBJECT_ARRAY = new Object [0];
private static final Class [] EMPTY_CLASS_ARRAY = new Class [0];
static
{
FINAL_IMMUTABLE_CLASSES = new HashSet (17);
// Add some common final/immutable classes:
FINAL_IMMUTABLE_CLASSES.add (String.class);
FINAL_IMMUTABLE_CLASSES.add (Byte.class);
...
FINAL_IMMUTABLE_CLASSES.add (Boolean.class);
}
} // End of class
Note the use of java.lang.reflect.AccessibleObject.setAccessible() to gain access to nonpublic fields. Of course, this requires sufficient security privileges.
請注意,使用了 java.lang.reflect.AccessibleObject.setAccessible() 來(lái)獲得對 non-public 字段的訪(fǎng)問(wèn)。當然,這也需要有足夠安全級別的權限才能為之。
Since the introduction of JDK 1.3, setting final fields via reflection is no longer possible (see Note 1 in Resources); so, this approach resembles Approach 1 because it can‘t handle final fields. Note also that inner classes cannot have no-arg constructors by definition (see Note 2 in Resources), so this approach will not work for them either.
自從 JDK 1.3 以來(lái),通過(guò)映像(reflection)設置 final 字段就不再被允許了。(詳見(jiàn)參考資源中的注釋1);因此,本方案類(lèi)似方案1,它無(wú)法處理 final 字段的情形。還請注意,內隱類(lèi)別(inner classes)不能在其定義中含有無(wú)參數構造含數(詳見(jiàn) 參考資源中的注釋2),故本方案也無(wú)法處理內隱類(lèi)別(inner classes)情形。
Coupled with the no-arg constructor requirement, this approach restricts the type of classes it can handle. But you would be surprised how far it can go. The full implementation adds a few useful features. While traversing the object graph rooted at the source object, it keeps an internal objMap parameter that maps values in source object graphs to their respective clones in the cloned graphs. This restores the ability to preserve object graphs that I had in Approach 3. Also, the metadataMap parameter caches class metadata for all classes that it encounters while cloning an object and improves performance by avoiding slow reflection. The relevant data structures are scoped to a single call to clone(), and the overall idea is very similar to Java serialization revamped to just do object cloning. Similar to the previous section, a whole hierarchy of suitable classes can be made cloneable by equipping the base class with one generic method:
該方案除了有“需要無(wú)參數構造函數”之要求,還對能夠處理的類(lèi)別有所限制。但您也許會(huì )驚訝于其能夠做到什么程度。完整的 實(shí)現 中增加了幾個(gè)有用的功能。在遍歷根基于克隆源對象的對象圖面(object graph)過(guò)程中,該實(shí)現會(huì )保留一個(gè)內部的 objMap 參數,用來(lái)將克隆源對象之圖面中的值對應到其克隆目標對象的圖面中去。這樣做就回復了方案3中的那種“保持對象圖面”的能力。另外, metadataMap 參數用來(lái)緩存(caches)克隆過(guò)程中遇到的所有類(lèi)別之元數據(metadata),以此盡量避免緩慢的影像(reflection)動(dòng)作從而提高性能。相關(guān)數據結構的生存空間被限定在單獨的 clone() 調用之中,其總體想法非常類(lèi)似于“為了讓其專(zhuān)做對象克隆而對 Java serialization(次第讀寫(xiě)) 進(jìn)行修補”。這里的情形類(lèi)同前面的小節:為基類(lèi)搭載一個(gè)通用的方法,就可讓整個(gè)相互搭配的類(lèi)別階層體系具有 cloneable 性質(zhì):
public Object clone ()
{
return ReflectiveClone.clone (this);
}
What is this method‘s performance? Rerunning the test with the REFLECTION branch selected produces:
這個(gè)方法的性能如何呢?以 REFLECTION 分支重新運行測試程序產(chǎn)生出如下結果:
clone implementation: reflection
method duration: 0.537 ms
This is roughly five times faster than straightforward serialization—not too bad for another generic approach. In terms of its performance and capabilities, it represents a compromise between the other three solutions. It can work very well for JavaBean-like classes and other types that usually do not have final fields.
這比直截了當型的次第讀寫(xiě)方案大約快了5倍——作為一個(gè)通用的方案還不算太壞。從其性能和處理能力來(lái)考量,該方案代表了對另外三個(gè)解決方案的折衷. 對于 JavaBean 形式的類(lèi)別以及其它通常沒(méi)有 final 字段的型別,該方案非常湊效。
Resource considerations
對資源的考量
Measuring memory overhead is more difficult than measuring performance. It should be obvious that the first two approaches shine in this area, as they instantiate only the data that will populate the cloned fields.
度量?jì)却尕摵杀榷攘啃阅芨鼮槔щy。在內存負荷方面,前兩個(gè)方案應該具有很明顯的優(yōu)勢,因為其中只有用來(lái)轉存(populate)克隆字段的數據才會(huì )被具現化(instantiated)。
Cloning via serialization has an extra drawback that may have escaped your attention above. Even though serializing an object preserves the structure of the object graph rooted at that instance, immutable values will get duplicated across disjoint calls to clone(). As an example, you can verify for yourself that
您或許剛才還沒(méi)留意,通過(guò)次第讀寫(xiě)(serialization)進(jìn)行克隆還另有一個(gè)缺點(diǎn)。盡管次第讀寫(xiě)對象時(shí)能夠保持根基于該實(shí)體的對象圖面(object graph)結構,不可變的值卻會(huì )在對 clone() 方法的單個(gè)調用過(guò)程中被復制。作為例證,您可以自行驗證:
TestClass obj = new TestClass ("dummy");
System.out.println (obj.m_string == ((TestClass) obj.clone ()).m_string);
will print false for Approach 3 only. Thus, cloning via serialization will have a tendency to pollute heap with redundant copies of immutable objects like Strings. Approaches 1 and 2 are completely free from this problem, and Approach 3 is mostly free from it.
其結果僅在采用方案3時(shí)列印出 false 。如此看來(lái),通過(guò)次第讀寫(xiě)(serialization)進(jìn)行克隆就具有傾向性,容易產(chǎn)生冗余的諸如 Strings 這樣的不可變對象,從而污染堆(heap)空間。方案1和方案2中完全不存在這個(gè)問(wèn)題,而方案3中則是幾乎不存在這個(gè)問(wèn)題。
A quick and dirty proof of these observations can be seen by changing the body of Main.main() to keep the clones in memory and track the object count when a given heap size is reached:
有個(gè)蹩腳又便宜的辦法來(lái)證實(shí)上面的發(fā)現,只要改變 Main.main() 函數體,令其在內存中保留克隆體,并在堆空間增長(cháng)到一定大小時(shí)追蹤對象計數即可:
int count = 0;
List list = new LinkedList ();
try
{
while (true)
{
list.add (obj.clone ());
++ count;
}
}
catch (Throwable t)
{
System.out.println ("count = " + count);
}
Run this in a JVM with a -Xmx8m setting and you will see something similar to this:
若在 JVM 中以 -Xmx8m 設置來(lái)運行上述代碼,您將看到類(lèi)似如下的結果:
>java -Xmx8m Main
clone implementation: Object.clone()
count = 5978 Exception in thread "main" java.lang.OutOfMemoryError
...
clone implementation: copy construction
count = 5978
...
clone implementation: serialization
count = 747
...
clone implementation: reflection
count = 5952
Approach 3‘s overhead increases with the number of immutable fields in a class. Removing this overhead is nontrivial.
方案3的負荷隨著(zhù)類(lèi)別中不可變字段(immutable fields)數量的增加而增加。消除該負荷則需要一些心力。
The recap
摘要列表
The following table recaps the properties of all cloning approaches in this article from several perspectives: speed, resource utilization, class design constraints, object graph handling.
下面的表格從幾個(gè)方面整理了本文中所有克隆方案,這些方面包括:速度;資源利用率;類(lèi)別設計上的約束;對象圖面掌控情況。
Object.clone() Speed High Resource utilization Low Class design constraints Does not work with deep final fields; does not work with inner classes; must implement Cloneable; medium amount of manual class maintenance Object graphs Does not handle object graphs transparently Copy construction Speed High Resource utilization Low Class design constraints Superclasses and subclasses must cooperate; copy constructor required; a lot of manual class maintenance Object graphs Does not handle object graphs transparently Serialization Speed Low Resource utilization High; creates redundant immutable fields Class design constraints Must implement Serializable; first non-Serializable class needs an accessible no-arg constuctor Object graphs Handles object graphs Reflection Speed Medium Resource utilization Medium Class design constraints Does not work with final fields; does not work with inner classes; each class must provide no-arg constructor Object graphs Handles object graphs
Object.clone() 速度 高 資源利用率 低 類(lèi)別設計上的約束 不能應用于深層final字段的情形;不能應用于內隱類(lèi)別(inner class)的情形;必須實(shí)現 Cloneable 接口;類(lèi)別的手動(dòng)維護所需工作量適中。 對象圖面 不能透明的掌控對象圖面。 拷貝構造動(dòng)作 速度 高 資源利用率 低 類(lèi)別設計上的約束 基類(lèi)和子類(lèi)必須相互協(xié)作配合;需要拷貝構造函數;類(lèi)別的維護所需工作量大。 對象圖面 不能透明的掌控對象圖面。 Serialization 速度 低 資源利用率 高;創(chuàng )建冗余的不可變(immutable)字段 類(lèi)別設計上的約束 必須實(shí)現 Serializable 接口;第一個(gè) non-Serializable 類(lèi)別需要一個(gè)可訪(fǎng)問(wèn)的無(wú)參數構造函數。 對象圖面 掌控對象圖面。 Reflection 速度 適中 資源利用率 適中 類(lèi)別設計上的約束 不能應用于final字段的情形;不能應用于內隱類(lèi)別(inner class)的情形;類(lèi)別必須提供無(wú)參數構造函數。 對象圖面 掌控對象圖面。
This article discussed implementing a single method, Object.clone(). It is amazing that a single method can have so many implementation choices and subtle points. I hope this article provided you with some food for thought and useful guidelines for your application class design.
本文討論了 Object.clone() 這單獨一個(gè)方法的實(shí)現。令人驚異的是,一個(gè)方法竟然可以有這么多種實(shí)現方案和這么多微妙的細節要點(diǎn)。我希望本文帶給您一些思考的素材,并為您的應用程序之類(lèi)別設計提供了有用指導。
About the author
關(guān)于作者
Vladimir Roubtsov has programmed in a variety of languages for more than 12 years, including Java since 1995. Currently, he develops enterprise software as a senior developer for Trilogy in Austin, Texas.
Vladimir Roubtsov 具有超過(guò)十二年的多語(yǔ)言編程經(jīng)驗,掌握的語(yǔ)言包括從1995就年開(kāi)始使用的Java。目前他任職于德克薩斯州奧斯汀的 Trilogy 公司,作為高級開(kāi)發(fā)人員進(jìn)行企業(yè)級軟件的開(kāi)發(fā)。
聯(lián)系客服