Replies: 3 comments
-
Hi @ilikenwf. I have not yet tested it, but it may be possible to merge the spatial layers (from other 2D models to this one) of the model while keeping the temporal ones intact. This way, you could potentially maintain the temporal information while adding new data to the existing domain. You may however may need to finetune the temporal layers as well on similar data as the prior merge model, but it hasn't really been explored fully yet. |
Beta Was this translation helpful? Give feedback.
-
Ah that would be pretty neat. My expertise isn't really in this field, so I'm not super familiar with the APIs involved here. It would be great to be able to branch out and use the 2D spatial layers from others, though - get some of the more diverse and cartoony models represented. The only other model I've seen so far for text to video was one that was trained ground up from a dataset that was also used in a normal stable diffusion model, if memory serves. |
Beta Was this translation helpful? Give feedback.
-
So, whether we do this or not, perhaps you could get in on this and train a few models? Maybe not NSFW unless that's really something people want, but some better non shutterstock stuff would be nice: |
Beta Was this translation helpful? Give feedback.
-
I was wondering if perhaps we could use something like https://github.com/NormXU/safetensors-to-Diffusers to convert to diffusers, then convert for video generation?
This would open up the use of most models, including those from civitai.
Beta Was this translation helpful? Give feedback.
All reactions