Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize sliced streams #249

Merged
merged 24 commits into from
Feb 8, 2019
Merged

Generalize sliced streams #249

merged 24 commits into from
Feb 8, 2019

Conversation

DanRStevens
Copy link
Member

This lays some of the foundation for generalizing stream slicing. The FileSliceReader class doesn't really depend much on the specifics of the FileReader class. The class could eventually be refactored into a template that handles slices of arbitrary stream types.

This pull request aims to reduce some of the dependence of FileSliceReader on the specifics of FileReader. Most such dependencies were not particularly fundamental, but rather more in the wording of exception messages and names of variables and types.

This addresses some of the aspects brought up in Issue #72.

The more direct approach should be easier to read and understand. It
also generalizes to both signed and unsigned types. The old code would
be undefined, according to the C++ standard, if it had been used on
signed types (though would have still likely worked correctly in most
environments).
This separates some of the type specific wording in the exception
message, and uses formatting that is easier to update.
The `FileReader` class already has a copy constructor, so use that
instead of dealing with the specifics of constructing with a filename.

Note: The `WrappedStreamType` alias can eventually become a template
parameter.
This makes the phrasing a little more general, and reporting of wrapped
stream specific data (filename) a little more consistent.
The old code, with recent changes, was creating a temporary copy of the
`FileReader` using `GetFilename`. This copy was then passed to the
`FileSliceReader` constructor, where it was copied again. This makes
things a little more direct, with only a single copy.
The `GetFilename()` method is specific to file related streams. If
slicing is generalized, and a different stream type is wrapped, this
won't work. By extracting this now, it minimizes the number of places
code will need to be modified to support wrapping a different stream
type.
Note: The `Position()` method returned:
`wrappedStream.Position() - startingOffset`
Slice parameter values should be in the domain of the slice. Currently
some of them look to be in the domain of the wrapped stream. This makes
the checks and usage of those values inconsistent. This test should help
catch such errors.
The single parameter version of `Slice()` should pass an offset in the
domain of the slice to the two parameter version of `Slice()`. This test
should catch errors translating the single parameter calls to two
parameter calls.
The `Slice()` methods should use paraemters in the domain of the slice,
rather than the domain of the wrapped stream.
When copy constructing from an existing object, this bounds check will
have already run on the original source object. Hence there is no reason
to do it again in the copy.

Side note: The `startingOffset` and `sliceLength` are both `const`.
@DanRStevens
Copy link
Member Author

I would be curious to get some feedback on this section of code as it pertains to the call from the copy constructor:

void FileSliceReader::Initialize()
{
if (sliceLength > std::numeric_limits<decltype(startingOffset)>::max() - startingOffset) {
throw std::runtime_error(
"The stream slice would run past the maximum possible stream length."
+ IdentifySource()
);
}
if (startingOffset + sliceLength > wrappedStream.Length()) {
throw std::runtime_error(
"The stream slice would run past the end of the source stream."
+ IdentifySource()
);
}
wrappedStream.Seek(startingOffset);
}

When copy constructing a slice object from another slice, these checks will have already run in the primary constructor. Does it make sense to run them again from the copy constructor for the copied object?

In the case of the first check, all values are const (startingOffset, sliceLength, numeric_limit::max()), so I'd say no there. As such, I've moved that check out to the primary constructor.

In the second check, the wrappedStream.Length() value is not const. It's possible for a stream to change length at runtime. In particular, the wrapped stream may shrink such that the slice of it now extends beyond the original wrapped stream. Considering this, it may make sense to check the length again when making a copy of a slice. However, if the copy is considered no good because the slice extends beyond the bounds of the wrapped stream, that means the original slice also now extends beyond the bounds of the wrapped stream. There is no way to declare the original source slice as invalid, or have it throw in any sort of way. My question then is, does it make sense to throw and prevent the copy construction?

Perhaps a deeper question, knowing that streams may change length, is the original check even valid? I think I would like to keep the original check. It does prevent unexpected behaviors, such as when trying to take too big of a slice of a fixed length stream, or trying to read from a partially written file. Though once the slice is created, it doesn't seem worthwhile to try and ensure the underlying stream doesn't shrink beyond the slice. I'm tempted to say, once a slice has been created, it may be reasonable to assume that slice and any sub-slice of that slice remains valid, even if the wrapped stream shrinks.

To convert this to a template, all methods must be inline in the header.
@DanRStevens
Copy link
Member Author

Ok, I think I've hit a good stopping point for today.

The design is still a little rigid. In particular, the GetFilename method is still a little too specific for the template. That might be found on a few different streams, but not every stream. Perhaps that method can be removed, or only conditionally provided if the wrapped stream also provides it. Similarly, the IdentifySource() method used in exception messages could be specialized to provide info on other stream types, or as a default, just return an empty string if there is no specific info it knows how to return.

Another point is using BidirectionalSeekableReader as the base class. Not all streams will derive from this class. With a bit of work, and some template guards, it may be possible to derive from the same base class as the wrapped stream, and also only provide the appropriate method overrides.

That may be work for a separate branch though.

@Brett208
Copy link
Member

Brett208 commented Feb 3, 2019

I read through the commits. It all seems like positive changes to me. I'll carve out time later to download and play with the code and review in more depth.

Copy link
Member

@Brett208 Brett208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a name change suggestion for now. If we perform the name change, we will want to update the name of the test file to match.

I pushed a PR for OP2Archive to update to the most recent version of OP2Utility. It had all the namespace changes in it so is riddled with small changes. Once that is settled, I'll use OP2Archive to do some more interesting checks on this.

It compiled fine without warnings on my CPU in x86 debug mode.

-Brett

// Access is bounds checked according to the slice parameters.
// The underlying stream must be copy constructible, so that an independent stream can be created.
template<class WrappedStreamType>
class StreamSlice : public BidirectionalSeekableReader
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of renaming this to Stream::ReaderSlice? To me, Stream::StreamSlice would indicate a read and write capability (or just be vague on what it does?).

If this sounds good, I can update the filename and the Visual Studio project files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name probably does need some work. I was purposefully vague on the name since I haven't really decided if this will apply only to read streams, or if it will generalize to write streams as well. I suppose I could just have two templates, though there would be a lot of code in common.

In terms of supporting different levels of the stream interface (Reader, Seekable, BidirectionalSeekable), there may need to be a lot of conditional code added to the template. As such, it may be a simple matter to conditionalize Read/Write, and have the template work for both stream types.

With that said, you're welcome to rename things. We can always rename again later if we come up with a better name.

Copy link
Member

@Brett208 Brett208 Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll rename to reflect read only for now and we can rename again if we ever progress.

Copy link
Member

@Brett208 Brett208 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ran OP2Archive using the new Stream::ReaderSlice. OP2Archive successfully constructed the ReaderSlice when extracting the contents of art.vol. The contents extracted properly both when OP2Archive was compiled in x86 and x64 mode. Between this test and the unit tests, I think we are safe to merge.

A few comments on the questions brought up by @DanRStevens

Considering bounds checking read streams: While it is true that the stream size being read could change, I feel like for most of my use cases, I expect the memory or file stream to be a static source when reading. So I would consider it helpful to keep bounds checking. This way I know if I made a mistake on the slice size or if the source is changing size unexpectedly.

If we wanted to get more sophisticated, perhaps there could be 2 versions, a Stream::ForwardReaderStatic and a Stream::ForwardReaderDynamic. This would cause a lot of branching though because I guess then you would have a Stream::BiDirectionalSeekableReaderStatic, etc.

Maybe we could add a flag to the seekable classes indicating if the reader should be static or not? This would cause the branching to occur inside the class and reduce the number of files and classes.

Either of these approaches might be misleading though because we are only bounds checking. Nothing is preventing someone from going inside a manually changing a WORD inside the stream contents without changing the length?

I don't have a strong preference either way. Sorry if that isn't helpful...

@DanRStevens
Copy link
Member Author

Hmm, I probably should have put more thought into this earlier before asking you to rename. Instead of ReaderSlice, maybe we should have used the name SliceReader. I was just looking at the other class names, and I think SliceReader would be more consistent.

Another point of clarification, I think we renamed the wrong thing. The template is currently named StreamSlice, and I think that construct should get the name SliceReader:

	template<class WrappedStreamType>
	class SliceReader : public BidirectionalSeekableReader

Meanwhile, the specific instantiation of that template, which wraps a FileReader should probably still be named FileSliceReader.

	using FileSliceReader = SliceReader<FileReader>;

The name of the header should probably be named after the template, hence SliceReader.h.


The specific instance using FileSliceReader = SliceReader<FileReader>; doesn't really need to be in the template's header file. I just put it there for backwards compatibility reasons during the transition period. It could get it's own header file, or it could just be declared where it is used, namely in FileReader.h, where there is already a forward declare for it. Actually, I think that forward declaration is sufficient. We should probably just remove the using FileSliceReader = ... line from the template header file.


Sounds like we should just largely ignore the stream size concern. You're right in that stream sizes can always change outside the control of our API. Namely, the filesystem API can change file sizes which we have readers for. We just have to accept the possibility of this happening.

I say let's just stick with what we have for now. What we have now fits our needs. Even if stream resizing was a bigger concern, I don't think it's a show stopper for the current API.

@Brett208
Copy link
Member

Brett208 commented Feb 7, 2019

Okay, no problem with the renames. I can try to fix this hopefully today or tomorrow. I probably didn't put enough time into thinking through the template verses the class.

-Brett

Copy link
Member Author

@DanRStevens DanRStevens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the rename changes look good. I think I'll merge now. Any concerns about the template rigidity can be addresses in a future pull request.

@DanRStevens DanRStevens merged commit 5465299 into master Feb 8, 2019
@DanRStevens DanRStevens deleted the generalizeSlicedStreams branch February 8, 2019 02:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants