Skip to content

Implement staging buffer and use it for the material system #1675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

VReaperV
Copy link
Contributor

@VReaperV VReaperV commented Jun 8, 2025

Adds GLStagingBuffer and GLBufferCopy(). GLStagingBuffer allows setting buffer storage and map flags to 0 on the destination buffers, which pretty much guarantees that they will go into VRAM. The data is uploaded through mapping stagingBuffer and doing a copy of the data on the GPU.

The stagingBuffer itself should (still depends on the driver) go into the BAR memory (on Nvidia this shows up as DMA_CACHED when using r_glDebugProfile on; r_glDebugMode 7).

Slightly improves material system performance. Currently this is used for all material system buffers except the portal one (since it requires reading from the GPU), and for the 2 geometry cache buffers that are currently used.

Also fixed GLBuffer.mapped state being retained on map change/load.

@VReaperV VReaperV added T-Bug T-Improvement Improvement for an existing feature A-Renderer T-Performance labels Jun 8, 2025
@VReaperV VReaperV moved this to In Progress in Material system Jun 8, 2025
@VReaperV VReaperV force-pushed the staging-buffer branch 3 times, most recently from 96d8c0d to 88c7d08 Compare June 8, 2025 09:03
@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 8, 2025

The CI error is caused by some unrelated bullshit.

@slipher
Copy link
Member

slipher commented Jun 8, 2025

FAILED: daemon 
: && /usr/bin/g++ -std=gnu++14 -fvisibility=hidden -fno-strict-aliasing -fno-gnu-unique -pthread -fPIC -march=x86-64 -mtune=generic -mcx16 -O3 -DNDEBUG -D_FORTIFY_SOURCE=2 -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-z,now CMakeFiles/client-objects.dir/GeneratedSource/DaemonBuildInfo/Engine.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/cmd.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/common.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/crypto.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/cvar.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/files.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/huffman.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/msg.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/net_chan.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/net_ip.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/translation.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_bot.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_ccmds.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_client.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_init.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_main.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_net_chan.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_sgame.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_snapshot.cpp.o CMakeFiles/client-objects.dir/src/engine/server/CryptoChallenge.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_avi.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_cgame.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_console.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_download.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_input.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_main.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_parse.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_scrn.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_serverlist.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_serverstatus.cpp.o CMakeFiles/client-objects.dir/src/engine/client/dl_main.cpp.o CMakeFiles/client-objects.dir/src/engine/client/hunk_allocator.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/ALObjects.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Audio.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Emitter.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/OggCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/OpusCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Sample.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Sound.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/SoundCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/WavCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_keys.cpp.o CMakeFiles/client-objects.dir/src/engine/client/key_binding.cpp.o CMakeFiles/client-objects.dir/src/engine/client/key_identification.cpp.o CMakeFiles/client-objects.dir/src/engine/sys/sdl_input.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/DetectGLVendors.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/gl_shader.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/shaders.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_animation.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_backend.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_bsp.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_cmds.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_curve.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_fbo.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_font.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/GeometryCache.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/GeometryOptimiser.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/GLMemory.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/InternalImage.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/Material.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/TextureManager.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_crn.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_dds.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_jpg.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_ktx.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_png.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_tga.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_webp.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_init.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_light.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_main.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_marks.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_mesh.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_iqm.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_md3.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_md5.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_skel.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_noise.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_scene.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_shade.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_shader.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_shade_calc.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_skin.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_sky.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_surface.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_vbo.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_video.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_world.cpp.o CMakeFiles/client-objects.dir/src/engine/sys/sdl_glimp.cpp.o CMakeFiles/client.dir/src/engine/client/ClientApplication.cpp.o -o daemon  libengine-lib.a  -lncursesw  /usr/lib/x86_64-linux-gnu/libformw.so  /usr/lib/x86_64-linux-gnu/libGL.so  /usr/lib/x86_64-linux-gnu/libSDL2main.a  /usr/lib/x86_64-linux-gnu/libSDL2.so  /usr/lib/x86_64-linux-gnu/libogg.so  /usr/lib/x86_64-linux-gnu/libvorbisfile.so  /usr/lib/x86_64-linux-gnu/libvorbis.so  /usr/lib/libopusfile.so  /usr/lib/x86_64-linux-gnu/libopus.so  /usr/lib/x86_64-linux-gnu/libogg.so  /usr/lib/x86_64-linux-gnu/libvorbisfile.so  /usr/lib/x86_64-linux-gnu/libvorbis.so  /usr/lib/libopusfile.so  /usr/lib/x86_64-linux-gnu/libopus.so  /usr/lib/x86_64-linux-gnu/libwebp.so  /usr/lib/x86_64-linux-gnu/libjpeg.so  /usr/lib/x86_64-linux-gnu/libpng.so  /usr/lib/x86_64-linux-gnu/libfreetype.so  /usr/lib/x86_64-linux-gnu/libGLEW.so  /usr/lib/x86_64-linux-gnu/libopenal.so  /usr/lib/x86_64-linux-gnu/libcurl.so  libsrclibs-mumblelink.a  libsrclibs-findlocale.a  /usr/lib/x86_64-linux-gnu/libhogweed.so  /usr/lib/x86_64-linux-gnu/libnettle.so  /usr/lib/x86_64-linux-gnu/libgmp.so  libsrclibs-nacl-native.a  -lm  /usr/lib/x86_64-linux-gnu/librt.a  -ldl  /usr/lib/x86_64-linux-gnu/libz.so  libsrclibs-minizip.a && :
/usr/bin/ld: CMakeFiles/client-objects.dir/src/engine/renderer/GLMemory.cpp.o: in function `GLStagingBuffer::MapBuffer(long)':
GLMemory.cpp:(.text+0x555): undefined reference to `GLStagingBuffer::SIZE'

That sounds related

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 8, 2025

FAILED: daemon 
: && /usr/bin/g++ -std=gnu++14 -fvisibility=hidden -fno-strict-aliasing -fno-gnu-unique -pthread -fPIC -march=x86-64 -mtune=generic -mcx16 -O3 -DNDEBUG -D_FORTIFY_SOURCE=2 -Wl,-O1 -Wl,--sort-common -Wl,--as-needed -Wl,--no-undefined -Wl,-z,relro -Wl,-z,now CMakeFiles/client-objects.dir/GeneratedSource/DaemonBuildInfo/Engine.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/cmd.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/common.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/crypto.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/cvar.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/files.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/huffman.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/msg.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/net_chan.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/net_ip.cpp.o CMakeFiles/client-objects.dir/src/engine/qcommon/translation.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_bot.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_ccmds.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_client.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_init.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_main.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_net_chan.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_sgame.cpp.o CMakeFiles/client-objects.dir/src/engine/server/sv_snapshot.cpp.o CMakeFiles/client-objects.dir/src/engine/server/CryptoChallenge.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_avi.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_cgame.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_console.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_download.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_input.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_main.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_parse.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_scrn.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_serverlist.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_serverstatus.cpp.o CMakeFiles/client-objects.dir/src/engine/client/dl_main.cpp.o CMakeFiles/client-objects.dir/src/engine/client/hunk_allocator.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/ALObjects.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Audio.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Emitter.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/OggCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/OpusCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Sample.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/Sound.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/SoundCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/audio/WavCodec.cpp.o CMakeFiles/client-objects.dir/src/engine/client/cl_keys.cpp.o CMakeFiles/client-objects.dir/src/engine/client/key_binding.cpp.o CMakeFiles/client-objects.dir/src/engine/client/key_identification.cpp.o CMakeFiles/client-objects.dir/src/engine/sys/sdl_input.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/DetectGLVendors.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/gl_shader.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/shaders.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_animation.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_backend.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_bsp.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_cmds.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_curve.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_fbo.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_font.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/GeometryCache.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/GeometryOptimiser.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/GLMemory.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/InternalImage.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/Material.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/TextureManager.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_crn.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_dds.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_jpg.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_ktx.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_png.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_tga.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_image_webp.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_init.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_light.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_main.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_marks.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_mesh.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_iqm.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_md3.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_md5.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_model_skel.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_noise.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_scene.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_shade.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_shader.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_shade_calc.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_skin.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_sky.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_surface.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_vbo.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_video.cpp.o CMakeFiles/client-objects.dir/src/engine/renderer/tr_world.cpp.o CMakeFiles/client-objects.dir/src/engine/sys/sdl_glimp.cpp.o CMakeFiles/client.dir/src/engine/client/ClientApplication.cpp.o -o daemon  libengine-lib.a  -lncursesw  /usr/lib/x86_64-linux-gnu/libformw.so  /usr/lib/x86_64-linux-gnu/libGL.so  /usr/lib/x86_64-linux-gnu/libSDL2main.a  /usr/lib/x86_64-linux-gnu/libSDL2.so  /usr/lib/x86_64-linux-gnu/libogg.so  /usr/lib/x86_64-linux-gnu/libvorbisfile.so  /usr/lib/x86_64-linux-gnu/libvorbis.so  /usr/lib/libopusfile.so  /usr/lib/x86_64-linux-gnu/libopus.so  /usr/lib/x86_64-linux-gnu/libogg.so  /usr/lib/x86_64-linux-gnu/libvorbisfile.so  /usr/lib/x86_64-linux-gnu/libvorbis.so  /usr/lib/libopusfile.so  /usr/lib/x86_64-linux-gnu/libopus.so  /usr/lib/x86_64-linux-gnu/libwebp.so  /usr/lib/x86_64-linux-gnu/libjpeg.so  /usr/lib/x86_64-linux-gnu/libpng.so  /usr/lib/x86_64-linux-gnu/libfreetype.so  /usr/lib/x86_64-linux-gnu/libGLEW.so  /usr/lib/x86_64-linux-gnu/libopenal.so  /usr/lib/x86_64-linux-gnu/libcurl.so  libsrclibs-mumblelink.a  libsrclibs-findlocale.a  /usr/lib/x86_64-linux-gnu/libhogweed.so  /usr/lib/x86_64-linux-gnu/libnettle.so  /usr/lib/x86_64-linux-gnu/libgmp.so  libsrclibs-nacl-native.a  -lm  /usr/lib/x86_64-linux-gnu/librt.a  -ldl  /usr/lib/x86_64-linux-gnu/libz.so  libsrclibs-minizip.a && :
/usr/bin/ld: CMakeFiles/client-objects.dir/src/engine/renderer/GLMemory.cpp.o: in function `GLStagingBuffer::MapBuffer(long)':
GLMemory.cpp:(.text+0x555): undefined reference to `GLStagingBuffer::SIZE'

That sounds related

Oh right, it's because of the moronic behaviour with static const in class declarations.

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 8, 2025

Well no, it's gotta be caused by something else too, because even with constexpr it still fails.

@slipher
Copy link
Member

slipher commented Jun 8, 2025

Yeah I don't think constexpr fixed anything for that case; it just behaves the same as const. In general you need C++17 inline constexpr for constants that truly behave as they ought to

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 8, 2025

Yeah I don't think constexpr fixed anything for that case; it just behaves the same as const. In general you need C++17 inline constexpr for constants that truly behave as they ought to

Yeah, it's defaulting to C++14 apparently.

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 8, 2025

It'll just have to do with the old redundant way.

Copy link
Member

@slipher slipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get crashes in the Nvidia driver if I make the buffer small enough that it can't hold all the data at once. For example SIZE = 20 << 20 for Thunder or SIZE = 2 << 20 for plat23.

@VReaperV
Copy link
Contributor Author

I get crashes in the Nvidia driver if I make the buffer small enough that it can't hold all the data at once. For example SIZE = 20 << 20 for Thunder or SIZE = 2 << 20 for plat23.

So don't do that?

@slipher
Copy link
Member

slipher commented Jun 10, 2025

That means all the code to handle pointer + size > SIZE etc. doesn't work.

@VReaperV VReaperV force-pushed the staging-buffer branch 2 times, most recently from e9827af to bf91241 Compare June 10, 2025 18:17
@VReaperV
Copy link
Contributor Author

Fixed.

@slipher
Copy link
Member

slipher commented Jun 11, 2025

I still get crashes. For example SIZE = 900000 and load plat23

@VReaperV
Copy link
Contributor Author

I still get crashes. For example SIZE = 900000 and load plat23

I don't get any crashes with that.

I don't get why you keep trying to use it with a different size buffer that it's not supposed to work with, either. I don't care if it breaks due to someone randomly changing the code.

@sweet235
Copy link
Contributor

Even though what I say here might be censored again, let me bother this one time.

I still get crashes. For example SIZE = 900000 and load plat23

Am I understanding this correctly: you change a constant in the source code (to yield a buffer with less than 4MB), and observe a crash? And you believe the source code has to be crafted such that no changes in any compile time constant may result in a crash?

@slipher
Copy link
Member

slipher commented Jun 11, 2025

I don't get why you keep trying to use it with a different size buffer that it's not supposed to work with, either.

To test all the code paths, since I don't know how to use over 128MB of buffers with existing assets. If the code path for flushing out data to make room for incoming data is not intended to work, we can replace it with Sys::Drop instead.

I tried on AMD+Linux with the same SIZE = 900 * 1000 loading plat23 and it's broken there too. Locked up the system for several minutes and the OOM killer zapped most of the running applications. As on Nvidia (in the brief moment before the latter crashes) the map geometry is not rendered.

@VReaperV
Copy link
Contributor Author

To test all the code paths, since I don't know how to use over 128MB of buffers with existing assets. If the code path for flushing out data to make room for incoming data is not intended to work, we can replace it with Sys::Drop instead.

I tested with a lower size (not 3.6mb) and the wrap-around does work correctly.

I tried on AMD+Linux with the same SIZE = 900 * 1000 loading plat23 and it's broken there too. Locked up the system for several minutes and the OOM killer zapped most of the running applications. As on Nvidia (in the brief moment before the latter crashes) the map geometry is not rendered.

That is not enough to fit all the data being mapped at the end of map loading.

@slipher
Copy link
Member

slipher commented Jun 11, 2025

Could the maximum allocation be different based on some settings? I logged the sizes and the maximum one for me on plat23 had a count between 800k and 900k. If I put SIZE smaller than the largest single allocation, it calls Sys::Drop as expected. If I make it just slightly larger than the largest single allocation, the map finishes loading but things are broken.

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 11, 2025

Could the maximum allocation be different based on some settings?

Why?

I logged the sizes and the maximum one for me on plat23 had a count between 800k and 900k. If I put SIZE smaller than the largest single allocation, it calls Sys::Drop as expected. If I make it just slightly larger than the largest single allocation, the map finishes loading but things are broken.

The point of a staging buffer is to transfer and allocate memory on the GPU efficiently, it's not some memory store that you reallocate randomly.

@slipher
Copy link
Member

slipher commented Jun 12, 2025

Could the maximum allocation be different based on some settings?

Why?

I thought maybe you are implying that 900000 was not enough for the largest single mapping with the comment, "That is not enough to fit all the data being mapped at the end of map loading."

If I understood the API design of the staging buffer correctly: after allocating a block of memory, you must write its contents and request copying to another buffer before you allocate the next block. So only one allocation is available to use at a time.

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 12, 2025

Could the maximum allocation be different based on some settings?

Why?

I thought maybe you are implying that 900000 was not enough for the largest single mapping with the comment, "That is not enough to fit all the data being mapped at the end of map loading."

Not quite: it's two mappings. It wouldn't make sense to fill up those allocations separately, and doing so would be slower. But since they both can't fit a 900000 size buffer at the same time, the first allocation gets flushed before any data is written to it, which is why there's no geometry visible in that case (I cannot reproduce a crash there though).

If I understood the API design of the staging buffer correctly: after allocating a block of memory, you must write its contents and request copying to another buffer before you allocate the next block. So only one allocation is available to use at a time.

It does work with multiple allocations, that's the reason (or part of the reason, anyway) that it's 128mb in size.

@slipher
Copy link
Member

slipher commented Jun 12, 2025

If I remove the geometry cache commit, I still get the crashes. The geometry cache allocations aren't actually the biggest ones anyway, at least on plat23.

I don't get crashes with the first commit alone. Introducing the staging buffer plus the "cleanup" commit is necessary. If I put back the code that zeroes the culledCommandsSSBO which was removed in the "cleanup" commit, it works. Could filling with zeroes be important? Of course I may just be studying noise which depends on the allocation pattern.

@VReaperV
Copy link
Contributor Author

If I remove the geometry cache commit, I still get the crashes. The geometry cache allocations aren't actually the biggest ones anyway, at least on plat23.

I mean the surfaceDescriptorsSSBO and surfaceCommandsSSBO.

I don't get crashes with the first commit alone. Introducing the staging buffer plus the "cleanup" commit is necessary. If I put back the code that zeroes the culledCommandsSSBO which was removed in the "cleanup" commit, it works. Could filling with zeroes be important? Of course I may just be studying noise which depends on the allocation pattern.

It's most likely just a conesquence of surfaceDescriptorsSSBO/surfaceCommandsSSBO not being filled out correctly. culledCommandsSSBO is only actually written on the GPU, and none of the rendering commands access parts of the buffer that weren't written to. Since you're giving garbage data as input in surfaceDescriptorsSSBO/surfaceCommandsSSBO, it will access garbage in culledCommandsSSBO.

@slipher
Copy link
Member

slipher commented Jun 12, 2025

Thanks, now I get it. Writes to surfaceDescriptorsSSBO and surfaceCommandsSSBO are interleaved; the initial memset of surfaceCommands is not the final say on its contents. The interleaving causes potential problems since the staging buffer API only guarantees the most recent mapping to stay mapped.

The bug here seems fairly possible to hit in real life by making a huge map. I think we should make sure to prevent it since it has the potential to bring down the user's whole system. How about writing the surface commands to a temporary CPU buffer before writing them to the GL buffer? Besides fixing the bug, that would stop the following code from having to read back data from the staging buffer. I think reading back those buffers is something we should try to avoid?

for ( int i = 0; i < MAX_VIEWFRAMES; i++ ) {
	memcpy( surfaceCommands + surfaceCommandsCount * i, surfaceCommands, surfaceCommandsCount * sizeof( SurfaceCommand ) );
}

@VReaperV
Copy link
Contributor Author

VReaperV commented Jun 12, 2025

Thanks, now I get it. Writes to surfaceDescriptorsSSBO and surfaceCommandsSSBO are interleaved; the initial memset of surfaceCommands is not the final say on its contents. The interleaving causes potential problems since the staging buffer API only guarantees the most recent mapping to stay mapped.

That is correct.

The bug here seems fairly possible to hit in real life by making a huge map.

This would require approximately 2e6 surfaces or ~4.2e7 triangles. I believe daemon would fail elsewhere long before that point.

Besides fixing the bug, that would stop the following code from having to read back data from the staging buffer. I think reading back those buffers is something we should try to avoid?

Yes, but it's not a big deal on map load. I also have some plans for changes that will slightly change the pipeline and eliminate the need for interleaving these writes.

@slipher
Copy link
Member

slipher commented Jun 12, 2025

I don't know what the relation with triangle counts is but Thunder uses 35 MB for the pair of allocations so it would need to be just 3.7x bigger, doesn't seem out of the realm of possibility.

@slipher
Copy link
Member

slipher commented Jun 12, 2025

There is a cvar that affects the buffer sizes. r_subdivisions changes the sizes for the surface stuff and geometry cache. Could affect reproducibility.

VReaperV added 4 commits June 13, 2025 01:34
Adds `GLStagingBuffer` and `GLBufferCopy()`. `GLStagingBuffer` allows setting buffer storage and map flags to 0 on the destination buffers, which pretty much guarantees that they will go into VRAM. The data is uploaded through mapping `stagingBuffer` and doing a copy of the data on the GPU.
@VReaperV
Copy link
Contributor Author

I don't know what the relation with triangle counts is but Thunder uses 35 MB for the pair of allocations so it would need to be just 3.7x bigger, doesn't seem out of the realm of possibility.

The fact that it's directly correlated with the amount of surfaces in material system.

Anyway, I've changed it up a bit, so now it's only uploading and mapping ~3.13mb for these 2 buffers on thunder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Renderer T-Bug T-Improvement Improvement for an existing feature T-Performance
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants