Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reasonClassSizeGTE_D is buggy #209

Closed
Trass3r opened this issue Feb 7, 2022 · 28 comments
Closed

reasonClassSizeGTE_D is buggy #209

Trass3r opened this issue Feb 7, 2022 · 28 comments

Comments

@Trass3r
Copy link
Contributor

Trass3r commented Feb 7, 2022

ooprolog --facts DK2.facts.pl --results DK2.results.pl <

Consistency checks failed.
insanityClassSizeInvalid failed: Class=0x66eca4 LTESize=0x84 GTESize=0xb0
Initial sanity check failed, indicating the OO rules are incorrect.
Please report this failure to the Pharos developers!
  [29] prolog_stack:get_prolog_backtrace(100,[frame(29,clause(<clause>(0x55fc298618b0),6),_24981518)|_24981506],[goal_term_depth(100)]) at /usr/local/lib/swipl/library/prolog_stack.pl:134
  [28] throw_with_backtrace(error(system_error(initialSanityChecks))) at /usr/local/share/pharos/prolog/oorules/util.pl:185
  [26] solve_internal at /usr/local/share/pharos/prolog/oorules/setup.pl:659

Facts

Even when using the purecall(0x634d60). hack from #139. Still the same testbed.

@sei-eschwartz sei-eschwartz self-assigned this Feb 7, 2022
@sei-eschwartz
Copy link
Collaborator

I was able to reproduce this, thanks.

@sei-eschwartz
Copy link
Collaborator

Adding consistency checks at each step during initial reasoning finds a slightly different problem:

insanityClassSizeInvalid failed: Class=0x670374 LTESize=0x18 GTESize=0x1c

reasonClassSizeGTE_D(0x670374, 0x1c).
reasonClassSizeGTE_D(0x670374, 0x18).

% The given class (associated with the constructor) is certain to be of this exact size.  The                                                                                                                                                                                                                                                         
% reasoning is that we're able to track an allocation site with a known size to the constructor                                                                                                                                                                                                                                                       
% associated with the class.  There's a small bit of ambiguity about what the compiler will                                                                                                                                                                                                                                                           
% generate for arrays of objects and other unusual cases, but this rule is a good start.     

So we decide that the object is exactly 0x18 bytes long and 0x1c bytes long. Yep, that sounds like a contradiction to me ;)

@sei-eschwartz
Copy link
Collaborator

Some additional logging:

reasonClassSizeGTE_D(0x670374, 0x18, 0x591820, sv_5533142198173699050, 0x591bf0).
reasonClassSizeGTE_D(0x670374, 0x1c, 0x591820, sv_10413347185328835298, 0x591bf0).

@sei-eschwartz
Copy link
Collaborator

[eschwartz@pd4 issue209]$ cat DK2.exe.facts | fgrep -e sv_5533142198173699050 -e sv_10413347185328835298                                                                   
insnCallsNew(0x591b0a, 0x591820, sv_10413347185328835298).
insnCallsNew(0x591b64, 0x591820, sv_5533142198173699050).
thisPtrAllocation(0x591b64, 0x591820, sv_5533142198173699050, type_Heap, 0x18).
thisPtrAllocation(0x591b0a, 0x591820, sv_10413347185328835298, type_Heap, 0x1c).
callParameter(0x591b37, 0x591820, ecx, sv_10413347185328835298).
callParameter(0x591b8f, 0x591820, ecx, sv_5533142198173699050).
callReturn(0x591b0a, 0x591820, eax, sv_10413347185328835298).
callReturn(0x591b37, 0x591820, eax, sv_10413347185328835298).
callReturn(0x591b64, 0x591820, eax, sv_5533142198173699050).
callReturn(0x591b8f, 0x591820, eax, sv_5533142198173699050).

@sei-eschwartz
Copy link
Collaborator

I think I know what is happening. The function 0x591820 is allocating either a CEngineSurface or a CEngineCompressedSurface. Both of these inherit from CEngineSurfaceBase, whose constructor is 0x591bf0.

So, this seems to be a counter-example for reasonClassSizeGTE_D. If a function allocates two classes that have a common base, but the classes have a different allocation size, it will cause GTE_D to fail.

But we already have a clause to deal with this: https://github.com/cmu-sei/pharos/blob/master/share/prolog/oorules/rules.pl#L3222

    % ejs 1/5/21: This rule only applies if we know there are no base classes using some of the
    % allocated object's space.
    factClassHasNoBase(Class),

factClassHasNoBase(Class) means that Class has no base classes. This is true for CEngineSurfaceBase. Unfortunately, I think we meant that there are no classes derived from Class. Unfortunately, we don't really have a fact for this. But maybe we need one. This specific example is an easier case because we have RTTI, so we can presumably look at that to determine there are no derived classes.

@sei-ccohen What do you think?

@sei-eschwartz
Copy link
Collaborator

@Trass3r As a workaround, you could try to remove one of or both of the thisPtrAllocation facts I posted above.

@Trass3r
Copy link
Contributor Author

Trass3r commented Feb 7, 2022

@sei-eschwartz doesn't seem to make a difference?

@sei-eschwartz
Copy link
Collaborator

You are right, there appears to be at least one other problem when those are commented out. I'll play with it some more today.

@sei-eschwartz
Copy link
Collaborator

Here is the error I get when I comment those out:

insanityClassSizeInvalid failed: Class=0x66ed14 LTESize=0x70 GTESize=0xb0

0x66ed14 is apparently the vftable for the CGadget class.

This is what IDA has to say about 0x66ed14:

.rdata:0066ED14 ; const CGadget::`vftable'
.rdata:0066ED14 ??_7CGadget@@6B@ dd offset ??_Gfacet@locale@std@@UAEPAXI@Z_2
.rdata:0066ED14                                         ; DATA XREF: sub_52B810+1C↑o
.rdata:0066ED14                                         ; std::locale::facet::`scalar deleting destructor'(uint)+9↑o ...
.rdata:0066ED14                                         ; std::locale::facet::`scalar deleting destructor'(uint)
reasonClassSizeGTE_D(0x66ed14, 0xb0, 0x52a0d0, sv_6768039204246569279, 0x52b810).
reasonClassSizeGTE_D(0x66ed14, 0xb0, 0x52b390, sv_2626095382409969213, 0x52b810).

0x52b810 is CGadget::CGadget. It looks like 0x52a0d0 constructs a CButton, which inherits from CGadget. So same problem.

@sei-eschwartz sei-eschwartz changed the title insanityClassSizeInvalid again reasonClassSizeGTE_D is buggy Feb 8, 2022
@sei-eschwartz
Copy link
Collaborator

@Trass3r For now you may just want to disable the problematic rules:

diff --git a/share/prolog/oorules/rules.pl b/share/prolog/oorules/rules.pl
index 3c02dffd..a9411d99 100644
--- a/share/prolog/oorules/rules.pl
+++ b/share/prolog/oorules/rules.pl
@@ -3228,7 +3228,7 @@ reasonClassSizeGTE(Class, Size) :-
     or([reasonClassSizeGTE_A(Class, Size),
         reasonClassSizeGTE_B(Class, Size),
         reasonClassSizeGTE_C(Class, Size),
-        reasonClassSizeGTE_D(Class, Size),
+        %reasonClassSizeGTE_D(Class, Size),
         reasonClassSizeGTE_E(Class, Size),
         reasonClassSizeGTE_F(Class, Size),
         reasonClassSizeGTE_G(Class, Size)
@@ -3281,7 +3281,7 @@ reasonClassSizeGTE_D(Class, Size) :-
     factClassHasNoBase(Class),
     % Debugging
     logtraceln('~@~Q.', [not((factClassSizeGTE(Class, ExistingSize), ExistingSize >= Size)),
-                         reasonClassSizeGTE_D(Class, Size)]).
+                         reasonClassSizeGTE_D(Class, Size, Function, ThisPtr, Constructor)]).
 
 % The given class is certain to be of this size or greater.  The reasoning for this rule is
 % that if we're certain that there's a member at a given offset and size, then the object must
@@ -3345,7 +3345,7 @@ reasonClassSizeLTE(Class, Size) :-
     %logwarnln('Recomputing reasonClassSizeLTE...'),
     or([reasonClassSizeLTE_A(Class, Size),
         reasonClassSizeLTE_B(Class, Size),
-        reasonClassSizeLTE_C(Class, Size),
+        %reasonClassSizeLTE_C(Class, Size),
         reasonClassSizeLTE_D(Class, Size)
       ]).
 

This does get past initial reasoning at least. It's still running for me.

@sei-eschwartz
Copy link
Collaborator

sei-eschwartz commented Feb 8, 2022

And it hit another sanity problem :(

Contradictory information about constructor: factNOTConstructor(0x5b88c0) but reasonConstructor(0x5b88c0)

0x5b88c0 is definitely not a constructor

@Trass3r
Copy link
Contributor Author

Trass3r commented Feb 8, 2022

.rdata:0066ED14 ; const CGadget::`vftable'
.rdata:0066ED14 ??_7CGadget@@6B@ dd offset ??_Gfacet@locale@std@@UAEPAXI@Z_2
.rdata:0066ED14                                         ; DATA XREF: sub_52B810+1C↑o
.rdata:0066ED14                                         ; std::locale::facet::`scalar deleting destructor'(uint)+9↑o ...
.rdata:0066ED14                                         ; std::locale::facet::`scalar deleting destructor'(uint)

Yeah the locale things should be due to ICF I think, a bit misleading.

@sei-eschwartz
Copy link
Collaborator

Neat, we knew about ICF but not exactly what it was called. @sei-ccohen https://docs.microsoft.com/en-us/cpp/build/reference/opt-optimizations?view=msvc-170

@sei-eschwartz
Copy link
Collaborator

reasonConstructor_A(0x5b88c0)

Concluding factMethod(0x5b88c0).
reasonNOTVFTable_A(0x5b88c0).
Concluding factNOTVFTable(0x5b88c0).
reasonNOTVFTable_A(0x5b88c0).
reasonNOTConstructor_F(0x5b88c0, 0x5d91e0).
Concluding factNOTConstructor(0x5b88c0).
reasonNOTDeletingDestructor_E(0x5b88c0, 0x5d90c0).
reasonNOTDeletingDestructor_E(0x5b88c0, 0x5d91e0).
reasonNOTDeletingDestructor_F(0x5b88c0).
.text:005D91E0 sub_5D91E0      proc near               ; CODE XREF: sub_5D8B10+133↑p
.text:005D91E0                                         ; sub_5D8B10+231↑p ...
.text:005D91E0
.text:005D91E0 arg_0           = byte ptr  4
.text:005D91E0
.text:005D91E0                 push    esi
.text:005D91E1                 mov     esi, ecx
.text:005D91E3                 call    j_problem_method
.text:005D91E8                 test    [esp+4+arg_0], 1
.text:005D91ED                 jz      short loc_5D91F8
.text:005D91EF                 push    esi             ; Block
.text:005D91F0                 call    ??3@YAXPAX@Z    ; operator delete(void *)
.text:005D91F5                 add     esp, 4
.text:005D91F8
.text:005D91F8 loc_5D91F8:                             ; CODE XREF: sub_5D91E0+D↑j
.text:005D91F8                 mov     eax, esi
.text:005D91FA                 pop     esi
.text:005D91FB                 retn    4
.text:005D91FB sub_5D91E0      endp

j_problem_method is a thunk to:

.text:005B88C0 destructor_calls_delete proc near       ; CODE XREF: sub_555CC0+244↑p
.text:005B88C0                                         ; sub_55C3C0+102↑p ...
.text:005B88C0                 mov     eax, [ecx+8]
.text:005B88C3                 mov     dword ptr [ecx], offset sure_looks_like_a_vtable
.text:005B88C9                 push    eax             ; Block
.text:005B88CA                 call    ??3@YAXPAX@Z    ; deletes ecx+8
.text:005B88CF                 add     esp, 4
.text:005B88D2                 retn
.text:005B88D2 destructor_calls_delete endp

0x5b88c0 and 0x5d91e0 are both kind of weird. 0x5d91e0 more so. It calls delete on the pointer at offset 8. So it's not a regular deleting destructor.

Anyway, let me summarize:

  • We decide that 0x5b88c0 has to be a constructor or destructor because it installs a vftable. (This is a bad assumption to start with, but it might be right.)
  • We decide 0x5d91e0 is a deleting destructor because it calls delete.
  • We decide 0x5b88c0 is not a constructor because 0x5d91e0 is not a constructor because it is a deleting destructor.

But we also conclude:

  • 0x5b88c0 is not a deleting destructor because it does not call delete on itself. It does call delete, but on a different pointer.
  • Therefore 0x5b88c0 is a constructor?

Why can't 0x5b88c0 be a real destructor? Trying to figure that out.

@sei-eschwartz
Copy link
Collaborator

Because pharos' static analysis decided it can't be

@sei-eschwartz
Copy link
Collaborator

I am running again here, but it would help @Trass3r if you have the facts log file and could look for the message that says:

OOAN[INFO ]: Method 0x005B88C0 is NOT a destructor because of the call at 0xXXXX

@sei-eschwartz
Copy link
Collaborator

Hmm. I just ran and I do have a noCallsAfter(0x5b88c0) fact, but you don't. What command line did you run to generate the facts file?

@sei-eschwartz
Copy link
Collaborator

I was able to run through with my .facts file and applying the patch I posted above.

DKII-files.zip

@Trass3r
Copy link
Contributor Author

Trass3r commented Feb 8, 2022

What command line did you run to generate the facts file?

Don't have it in front of me right now but should still be the same as in #139 (comment)
There are 2 new/delete pairs.

@sei-eschwartz ah actually 3 by now: --apidb share/contrib/winmm.json -n 634E80 --delete-method 6341B0 -n 57CEB0 --delete-method 57D0B0 -n 61ECA0 --delete-method 61ED10

@sei-eschwartz
Copy link
Collaborator

That indeed made a difference. I get:

OOAN[WHERE]: The call to constructor/destructor candidate at address 0x005FEEA4 was disproven by the earlier/later call at address 0x005FEEBB
OOAN[INFO ]: Method 0x005B88C0 is NOT a destructor because of the call at 0x005FEEA4

Looking at function 0x5fedd0, it seems that an object is constructed at stack offset 0x20, destructed, and then constructed and destructed again. Specifically:

  • Call 0x5B8770(var_20) to construct object
  • Call 0x5B88c0(var_20) to destruct object
  • Call 0x5B8770(var_20) to construct object
  • Call 0x5B88c0(var_20) to destruct object

When this happens on the heap, we can use a call to new or delete to say that these are two different object lifetimes. But on the stack we can't do that of course.

So right now the rationale is that 0x5b88c0 cannot be a destructor because 0x5b8770 is called after it on the same object. I wonder if we could change the logic to only look at the last two calls. If we think that 0x5b8770 might be a destructor, then it makes sense to only look at the last call of the destructor and see if any other method is invoked on that. The question is what side effects would that have?

@sei-eschwartz
Copy link
Collaborator

I have made some progress on #212. Testing it on DKII now. Hopefully with #212 fixed and ClassSizeGTE_D commented out we'll be able to get a complete run through.

@sei-eschwartz
Copy link
Collaborator

With those changes I was able to do a complete run. Note that the problem in ClassSizeGTE_D is still not fixed.

DKII.zip

@edmcman
Copy link
Contributor

edmcman commented Feb 14, 2022

Getting back to the GTE_D problem. We track a constructor call to offset 0 of a newly allocated object, and we basically want to know whether the constructor is for the newly allocated object.

The example we saw is that class C inherits from class B. Class C's constructor is inlined, but class B's is not. We can tell from manual examination because the inlined constructor has a VFTable installation, which is a sign that there are two constructors. So one observation is that if we see a call to a constructor at offset 0, and that constructor has a vftable installed at offset 0, then if a derived constructor was inlined we would also see a vftable installed at offset 0. If we see a call to a constructor with a vftable write, and the calling context does not have a vftable write, then I think the constructor is the outermost constructor. (Is this true for embedded objects?)

Alternatively, if we know that a class has no ancestors or descendants, the constructor call must be on the class itself, right? No... not quite because it could be on an embedded class at offset 0.

@edmcman
Copy link
Contributor

edmcman commented Feb 14, 2022

Is this true for embedded objects?

No: https://godbolt.org/z/nnGqvYP1h

@sei-mwd sei-mwd closed this as completed in 124a627 Jul 5, 2022
@Trass3r
Copy link
Contributor Author

Trass3r commented Jul 9, 2022

Not sure if this is still related to the original problem but it still doesn't run through:

Contradictory information about constructor: factConstructor(0x5de050) but reasonNOTConstructor(0x5de050)
Constraint checks failed, retracting guess!
failed.
Consistency checks failed.
Contradictory information about constructor: factConstructor(0x600400) but reasonNOTConstructor(0x600400)
Constraint checks failed, retracting guess!
tryBinarySearch completely failed on [0x5de050] and will now backtrack to fix an upstream problem.
Refusing to backtrack into reasoningLoop to fix an upstream problem because backtrackForUpstream/0 is not set.
This likely indicates that there is a problem with the OO rules.
Please report this failure to the Pharos developers!
 [150] prolog_stack:get_prolog_backtrace(100,[frame(150,clause(<clause>(0x5ce281eb4d00),6),_5626440)|_5626428],[goal_term_depth(100)]) at /usr/local/lib/swipl/library/prolog_stack.pl:137
 [149] throw_with_backtrace(error(system_error(upstreamProblem))) at /usr/local/share/pharos/prolog/oorules/util.pl:185
  [26] solve_internal at /usr/local/share/pharos/prolog/oorules/setup.pl:681
  [25] catch(user:solve_internal,_5626664,user:((_5626732=error(resource_error(private_table_space),_5626746)->complain_table_space(ooscript);_5626796=error(resource_error(stack),_5626810)->complain_stack_size(ooscript);true),throw(_5626842))) at /usr/local/lib/swipl/boot/init.pl:562
  [24] solve(ooscript) at /usr/local/share/pharos/prolog/oorules/setup.pl:617

@sei-eschwartz
Copy link
Collaborator

sei-eschwartz commented Jul 10, 2022 via email

@sei-eschwartz
Copy link
Collaborator

I might see what is going wrong...

reasonMergeVFTables_A(constructor, 0x672900, 0x6728f8, 0x672900, 0, factVFTableWrite(0x5dd7cf, 0x5dd760, 0, 0x672900)).
Concluding mergeVFTables(0x672900, 0x6728f8).

So we decide that this vftable belongs to the current constructor based on the vftable write. Here is the vftable write:

.text:005DD7C0 call ??2@YAPAXI@Z ; operator new(uint)
.text:005DD7C5 add esp, 4
.text:005DD7C8 cmp eax, edi
.text:005DD7CA jz short loc_5DD7DB
.text:005DD7CC mov [eax+8], edi
.text:005DD7CF mov dword ptr [eax], offset some_vftable_0

In other words, we install this vftable into a new heap-allocated object, not the "this" object this constructor is constructing.

I think this is a side-effect of a change we made recently that exported possibleVFTableFacts for non-this objects. The solution is to validate that the vftable write is in "this" object. I am trying this now.

@edmcman
Copy link
Contributor

edmcman commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants