Draft
Conversation
f7662ad to
0bc70d7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Allow models with more than 128 wide layers to run without error, by not using a fixed number of
CUDAStreamCompactionConfig.Closes #727
Todo
CUDAScatterrelies onMAX_STREAMS, so refactoring is required.FLAMEGPUDeviceExceptionrelies onMAX_STREAMS, so refactoring is required.CUDAScanCompaction::MAX_STREAMS, replacing with a member variable of the current number allocated.CUDAScanCompaction::MAX_STREAMSis/was checked)Notes
CUDAScanCompaction::MAX_STREAMSis hardcoded to 128, the upper limit that can run on a (<= SM75) device at once. This is a bad assumption.Models can have more than 128 functions per layer, which requires that many streams
CUDAScatter is initialsed as a singleton member of CUDASimulation, so we know the fixed model properties at that point in time, so can add a call to allocate enough data then.
DeviceExceptionManagerhas an array of 1 device pointer to a DeviceExceptionBuffer per stream, and host memory to copy that back to.DeviceExceptionManageris a member ofcudaSimulation::singletons, so can be allocated during singleton initialisaton.CUDAScanCompactionis a member variable of CUDAScatter, which is default initialised (rather than being manually constructed or mentioned by an inisialiser list.This will need to be changed to pass the number of streams to create during conscruction, or to allocate the required number of elements later.
This appeasr to to be the only instatntiations of
CUDAScanCompactionafiak.Destrcution / deleteion will also be required.