Discussion on:

15
Comments

Join the conversation!

Follow via:
RSS
Email Alert
0 Votes
+ -
Great Article... just what I've been looking for.

Why hasn't a generic LinkedList been included just like ArrayList?

Also, in the Java Collections there is a TreeSet and a TreeMap - any inclusion of Trees in the .NET Collections Framework?

Thnx
Okay, a few comments:

1. I'll start with the subject of upcasts and downcasts.... Casting a specific type to a more general type is an UPCAST, not a downcast. Conversely, casting a general type to a more specific type is a downcast. Downcasts are runtime "expensive", because the CLR must explicitly check whether the referenced object is actually of the type it's being cast to: it may not be, and therefore the cast must be checked and may fail. Upcasts, on the other hand, are not expensive because the compiler knows what you're referencing a particular object as, and can check to see if the type you're upcasting to is in its ancestry. If not, you get a compiler error, and no checking need be done at runtime. This is why upcasts maybe implicit, but downcasts must always be explicit.

(to be continued in first reply to this post)
2. Following on to the previous point, you mention that adding objects to the System.Collections collections, which all store things internally as System.Object, will require a box operation if the object to be stored is a value type (true, but likely preferable to copying the data from the value object the caller owns onto the stack as a parameter to be passed in as the specific value type, then likely copied again into a specifically typed value object contained as a member variable of the internal CollectionNode class inside the collection), or you say a downcast, but really an upcast (per point 1) if the passed in object is a reference object. With respect to the box operation with value objects, the alternative would be that you would have to have a templated collection, so that you could have collection types for every individual value type you needed. Given the design of the CLR, and the fact that assemblies need to know at compile time all the types they will be explicitly referencing (which a templated collection would have to do at runtime for the parameter type specified), templates are not an available solution to the problem, hence the necessity of the box operation.

(continued in next post)
0 Votes
+ -
#2...
privately_owed 17th May 2002
"true, but likely preferable to copying the data ..." - I'm not so sure... isn't what you describe normal "pass by value" behavior? I've been told that boxing is more expensive than a pass-by-value, though I could be wrong here.

"you say a downcast, but really an upcast" - granted wink

"the alternative would be that you would have to have a templated collection" - Exactly what I was hinting at, to set up a later piece on how to do exactly that. Certainly a single generic collection is preferable to a large number of typed collections. I'm not questioning the wisdom of the design choice at all, just pointing out it's downside.
0 Votes
+ -
You are correct that what I described is normal pass-by-value behavior; which is generally (in the C/C++ world) considered a bad habit with large structures. Generally in C if you want to pass around a copy of a struct, and you don't want anybody to be able to modify the struct, you pass around a "const struct MyStruct *" or whatever, indicating that the pointer is to a constant MyStruct.

To take up the topic of a box operation versus pass-by-value specifically, let me say this: in the caseof a single pass-by-value versus a box operation, you are correct that the pass-by-value is probably cheaper, because in the case of the box operation, the CLR must first allocate memory from the managed heap, then copy the contents of the value type into that heap memory. As such, you've got to pay for the copy anyway, plus the cost of the heap allocation. However, in this case, I suspect that the pass-by-value semantics would be less desirable for all but the smallest value types, in that if we imagine for a moment an implementation of some collection in terms of an array for a moment (as MyValueType[] myArray), then calling add(MyValueType mvt) on this collection would at the very least involve two copy operations on the passed in value object, one to copy it onto the stack as a parameter to the add method, and another to copy it into the array itself. If we instead had a linked list, where we had a ListNode class that specifically contained a MyValueType member, we would still have the same problem. So we have a minimum of two copy operations in such an implementation, not just one. I would suspect that for large value types, the double copy operation might be more expensive than the single box + copy operation (what thatthreshold of size would be I couldn?t say without experimenting some).
0 Votes
+ -
There's also a garbage collection cycle to consider on the new object variable. Of course that would come after you had disposed of the collection item.
0 Votes
+ -
Well, let it never be said that I won't fess up when I've made a bogus statement of my own. After doing some speed tests this afternoon, I've concluded that those odious copy operations aren't nearly as odious as I anticipated, and, in fact, you are correct that the pass by value semantics are in fact significantly more efficient timewise than the alternative of performing the box operation. It would appear that modern processors are much more efficient at performing memory copy operations than I wads giving them credit for. So kudos to you for sticking to your guns with respect to the costliness of the box operation on a value type.

Cheers,
Scott
To follow up a little more on my previous comments using the collection implemented as an array as an example, if the collection is sorted, that would make the storing of the value types directly in the array even worse, as an insert operation that needed to go into the middle of the array (let?s say at index j) would require the copying of each element at index > j to be copied over one element. This would no doubt be very expensive, particularly in a large collection. The alternative, to avoid this, would be for the array to not directly contain the value type, but rather a reference to an instance of an ArrayNode, which would then contain a MyValueType. This would avoid the copy operation on each value type, and instead would only have to copy references. In this case, however, we have basically gone the route of boxing anyway, in terms of the allocation of ArrayNodes on the managed heap, plus we still have to pay for the two copy operations (one onto the stack for the add() call, and one into the instance of ArrayNode. In this case, the only thing we save is the downcast on retrieve.

Nonetheless, I think implementing all but the smallest sorted collection as an array would be a huge error from the outset, and I do question Microsoft?s wisdom at providing only array backed implementations of the IList interfaces, rather than linked-list type collections that would be much more efficient in the general case (particularly with large collections or ones that are sorted). Also, growing a linked list is a much less expensive operation than growing an array.
3. With respect to the upcast perfomed on a reference objectto do an insert, you mention that "That's a little inefficient and will create some performance issues on very large collections." As I have previously discussed, the upcast involved thereis by no means runtime inefficient, as it produces no IL code in the runtime. Furthermore, I find your mention of the size of the collection with respect to this casting operation curious. In no way is the size of the collection involved in the question of whether storing objects as Objects or as their specific type is more efficient. You are correct in theorizing that both stores and retrieves will become less efficient as the size of the collection increases; to what extent is dependent onthe actual data structure involved - inserts and removes in a Hashtable are O(1) and therefore the effect will only be noticeable on VERY large collections, where the number of items in each bucket becomes significant. On the other hand, inserts into a sorted List are O(n) and will become very inefficient as the size increases. The same can be said of course for retrieve operations.

4. Lastly, I'd like to point out that the standard collections provided in the java.util package as part of the Java runtime are very similar to the .Net collections in regards to their solution to the same problem. Absenting templates, the JDK designers also chose to implement their generic collections the only other way available to them; they refer to everything as a java.lang.Object.

Cheers,
Scott M. Litvinoff
0 Votes
+ -
#3 and 4
privately_owed 17th May 2002
"upcast involved there is by no means runtime inefficient" - As you mentioned in #1, that's true since there're no code involved. However you would get a hit during retrieval, correct?

"I find your mention of the size of the collection curious" -Ah, That was a vague statement and you've misinterpreted me... there's no direct relation between object type and set/get speed in a collection, and I didn't mean to imply that there is.

I was speaking in general terms of performance implications. For example, iterating through a large generic collection, is likely to be less efficient than iterating through a "templated," strongly-typed collection, because you've got to cast back to the correct type before you can do anything meaningful with the stored object.

In regards to 4, as I stated, my intention is not to question the wisdom of such a design decision (implementing collections generically instead of as type-specific templates), but instead simply to point out it's downside in practice.
0 Votes
+ -
Thanks for your feedback, nice to know someone out there is paying attention at least.

We're always looking for new contributors, by the way, so please drop us an email if you think you'd be interested in contributing a technical article or three.
0 Votes
+ -
Right.
privately_owed 17th May 2002
Yes, you're right, I seem to have gotten "upcast" and "downcast" turned around, thanks. I'll respond to your other comments seperately.

BTW, "bogus" is a little harsh, don't you think??
0 Votes
+ -
My apologies
litvinof@... 17th May 2002
My apologies there. You're right that bogus was a bit harsh, and I more rightly should have titled the comment "A Few Quibbles...". Thanks for your prompt response to my posting.

Cheers,
Scott
0 Votes
+ -
I see "User Deleted" in situations where the user was not deleted. I wonder if there's some other way the the discussion system could mark these messages, such as "Unknown User."

Then if a moderator does actually delete a user, "User Deleted" would have more meaning. There were several weeks that I was afraid of replying to a "deleted user" for fear of getting that wrath on myself. *chuckle* I fear other newbies could get the same problem.
0 Votes
+ -
No one's been deleted, it's just an unfortunate error message from the discussion system. "User Deleted" was actually a database problem, and it appears to have been recently fixed.

Just for the record, the editors can (and do) delete inappropriate posts, but we have a pretty liberal definition of appropriateness, and we've never ever deleted a user.
Keyboard Shortcuts:
Prev
Next
Toggle
Join the conversation
Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

Join the TechRepublic Community and join the conversation! Signing-up is free and quick, Do it now, we want to hear your opinion.