Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GPU block copies between buffers and from buffer to RAM. #6049

Merged
merged 18 commits into from
May 25, 2014

Conversation

hrydgard
Copy link
Owner

@hrydgard hrydgard commented May 8, 2014

(first one not very well tested, second one improves Burnout Legends sun a lot).

Implements parts of #3515 and #618.

Also hopes to be an alternative to #6030.

Intended for Buffered Rendering, not the Read Framebuffer modes.

I hope that with this we can minimize the use of the Read Framebuffer modes (they will still be required in Dangan Ronpa).

@daniel229
Copy link
Collaborator

causes slow down and missing texture in Tactics Ogre.

master build
01

test build
02

@hrydgard
Copy link
Owner Author

hrydgard commented May 8, 2014

Hmm, thanks, not good.

What if you comment out this line:

BlitFramebuffer_(dstBuffer, dstX, dstY, srcBuffer, srcX, srcY, width, height);

to

 // BlitFramebuffer_(dstBuffer, dstX, dstY, srcBuffer, srcX, srcY, width, height);

@unknownbrackets
Copy link
Collaborator

IIRC, it does downloads and uploads in a way that both need to work. Also, there's some commented out code.

IIRC, Tales of Phantasia X and God of War use block transfers a lot too.

-[Unknown]

@daniel229
Copy link
Collaborator

@hrydgard
comment out that not help anything.

@dbz400
Copy link
Contributor

dbz400 commented May 8, 2014

It need to comment out .

    ReadFramebufferToMemory(srcBuffer, true, srcX, srcY, width, height);

@unknownbrackets
Copy link
Collaborator

By the way, some games would be helped by pretending there is a framebuffer. Vempire (iirc) uses dmac, but essentially blits a full-size framebuffer to +(0x44000 * 2) and then uses that as a pre-rendered background, blitting it back before each frame. It never uses that as a framebuffer, but it also never reads from the memory.

Tactics Ogre, I think, was doing something similar, I'm not sure now if it was modifying or reading the RAM or not... I think it wasn't, becaues I remember thinking what it was doing seemed pretty strange.

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented May 8, 2014

Hm, just realized I didn't finish Read properly, it only does the blit as a subblock but it then "packs" the whole framebuffer to RAM. Maybe it overwrites stuff.

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

YS 7 runs pretty well with this branch (even faster than my alternative branch that doing the same thing)

} else if (srcBuffer && g_Config.iRenderingMode == FB_BUFFERED_MODE) {
WARN_LOG_ONCE(btd, G3D, "Block transfer download %08x -> %08x", srcBasePtr, dstBasePtr);
if (g_Config.iRenderingMode == FB_BUFFERED_MODE) {
ReadFramebufferToMemory(srcBuffer, true, srcX, srcY, width, height);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to mess up the current framebuffer, which causes at least some of the missing stuff in Tactics Ogre. Maybe ReadFramebufferToMemory() should null out currentRenderVfb_?

Alsoe note, the stride may not match the framebuffer. This is the case in Tactics Ogre, but i don't know what it's doing with the block transfer.

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I'll change it to do that later.

Stride needs more work indeed.

@unknownbrackets
Copy link
Collaborator

Looking at our glReadPixels usage - how is it that we don't need to invert the y? This makes Adventures to Go! look really weird.

Tales of Phantasia X uses intra-same-buffer copies, at least in one place.

-[Unknown]

@solarmystic
Copy link
Contributor

Will do the usual round of testing once the commit(s) have been finalized, quite occupied (work and Dark Souls 2 PC :P ) at the moment.

The noticeable slowdown in Tactics Ogre (cutting speed to more than a third of the original performance numbers) as compared to master doesn't bode well however given that Buffered Rendering is the default option and this particular functionality will always be enabled in all builds and all games once this is merged to master. Reminds me of the Dual Source Alpha commit prior to refinement and selective application in games.

@hrydgard
Copy link
Owner Author

hrydgard commented May 9, 2014

@unknownbrackets , honestly I'm not 100% sure about what orientation we do the various operations in :P Need to clarify it in the code for sure...

Maybe we should switch to the approach where we simply render upside down to FBOs so that the coordinate systems match in all other details, instead of texturing from them upside down. It's the approach Epic uses so it should be ok :)

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

I remember when first time implemented the read to framebuffer mode , it uses glBlitframebuffer instead of draw pixels routine and turn the upside down texture after glReadPixels call.

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

If change to use glBlitFramebuffer for BlitFramebuffer_() , it is little bit faster but the image turn upside down.

fbo_bind_for_read(src->fbo);
glBlitFramebuffer(0, 0, src->renderWidth, src->renderHeight, 0, 0, dst->width, dst->height, GL_COLOR_BUFFER_BIT, GL_NEAREST);

ulus10551_00020

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

Should be okay now . Swap the srcY0 and srcY1.
glBlitFramebuffer(0, src->renderHeight, src->renderWidth, 0, 0, 0, dst->width, dst->height, GL_COLOR_BUFFER_BIT, GL_NEAREST);

ulus10551_00021

@daniel229
Copy link
Collaborator

Great,it fixes boku no natsuyasumi,colour is wrong though.
01

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

Great .The wrong color probably related to the BGRA stuff.

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

Yep , probably something wrong in Sync mode (this is sync = false)

 #ifdef MAY_HAVE_GLES3
    if (UseBGRA8888())
        glfmt = GL_BGRA_EXT;
  #endif

ucjs10038_00001

@unknownbrackets
Copy link
Collaborator

Oh, looks like if UseBGRA8888() && format = 88888 it needs to call convert on the pixels.

-[Unknown]

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

One worth noting that some games will take screenshot for save icon like Ys 7 are still not working yet in download block transfer.Probably something more to it ?

@hrydgard
Copy link
Owner Author

hrydgard commented May 9, 2014

Hm, that's surprising, I would expect that to work now...

@unknownbrackets
Copy link
Collaborator

Some games copy down vram either using their own memcpy (Trails in the Sky, for example), sceDmac (I forget which, maybe Ys Seven), or block transfers (Narikiri Dungeon X, iirc? not sure.) Some may also read it directly, I suppose, but I think they have to encode it to a png somehow.

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented May 9, 2014

Hm, right, of course. Block transfers aren't really an advantage there as they don't stretch the image anyway.

@hrydgard
Copy link
Owner Author

hrydgard commented May 9, 2014

Hm again, nulling out currentRenderVfb_ "breaks" the Burnout fix (which I think doesn't look 100% right anyway)...

@unknownbrackets
Copy link
Collaborator

Well, that would've made it render mistakenly to the wrong framebuffer, I think. Basically, setting it to NULL just makes it recalculate.

So it was "fixed" by rendering directly to whatever the block transfer target was.

What are the block transfers and framebuffer targets?

-[Unknown]

@hrydgard
Copy link
Owner Author

hrydgard commented May 9, 2014

It block transfers a 64x64 chunk of the main framebuffer (0x04000000 or 0x04088000) to a 64x64 (stride is 64 pixels) image at 0x04170000 , then presumably modifies it somehow with the CPU, then transfers it back to exactly the same location in the framebuffer (that transfer back can't possibly work correctly right now). It also later uses the same 64x64 image as a texture multiple times to draw kind of a radial blur, creating a crude "god ray" style effect.

So it's no great mystery really, just need to implement the block copy back.

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

Just quick update .Color issue fixed now by 7063d5a

@hrydgard
Copy link
Owner Author

hrydgard commented May 9, 2014

Thanks @raven02 .

@dbz400
Copy link
Contributor

dbz400 commented May 9, 2014

if i temporary relax the check here "Memory::IsVRAMAddress(src) && Memory::IsRAMAddress(dest)" for NotifyFramebufferCopy() .3rd birthday 1st screen and in-game menu can be detected "Memcpy fbo download" but interestingly "ReadFramebufferToMemory(srcBuffer, true, 0, 0, srcBuffer->width, srcBuffer->height);" still cannot make it showing correctly.

  • 1st screen

03:22:945 user_main W[G3D]: GLES\Framebuffer.cpp:1838 Memcpy fbo download 04000000 -> 04178000 with size 557056
03:22:946 user_main E[SCEGE]: GLES\Framebuffer.cpp:1611 Reading framebuffer to mem, bufSize = 557056, packed = 17819020, fb_address = 04000000

ules01513_00001

@solarmystic
Copy link
Contributor

@unknownbrackets I second the motion to have an option to enable/disable this function, at least until we've disseminated this update to the masses and get little no performance regression reports from our mobile users.

Tactics Ogre's slowdown is something even I have observed on my (admitted ancient) laptop. The impact would definitely be much more obvious on the weaker mobile platforms.

@Ritori
Copy link

Ritori commented May 25, 2014

Android build still broken though...

@daniel229
Copy link
Collaborator

digimon story is another much slowdown game
this branch
01

master
02

@hrydgard
Copy link
Owner Author

I'll make this an option that defaults to off and merge it shortly, so more can play with it.

@hrydgard
Copy link
Owner Author

Rebased on master and added a setting. Not sure about the naming of the setting though, currently it's "Simulate Block Transfer"...

hrydgard added a commit that referenced this pull request May 25, 2014
Implement GPU block copies between buffers and from buffer to RAM.
@hrydgard hrydgard merged commit 0b0e3ca into master May 25, 2014
@hrydgard hrydgard deleted the block-transfer-gpu branch May 25, 2014 08:51
@solarmystic
Copy link
Contributor

@hrydgard @unknownbrackets Danganronpa is broken again after this pull request was merged to master, even after disabling the Block Transfer option. I think the GLES3 exclusive lines may have been accidentally readded again.

EDIT:- It's just the Read Framebuffer to CPU option that is broken, my bad, even with the Block Transfer option off. Jeanne D'arc still works as expected, characters are present.

Current master after this was merged (Door scanning broken):-
capture

Previous master (Door scanning working):-
capture

@solarmystic
Copy link
Contributor

Done some bisecting, and apparently the first responsible commit of the lot from the pull request is 05be56a by @raven02

bisect

Issue report incoming.

@daniel229
Copy link
Collaborator

Looks like this also has not merged #6058

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants