508716 - Fluid dynamics simulator is slow

Reporter

Description

•

15 years ago

I get 12-13fps in t-m; Safari 4 gets closer to 40fps.

Stats:

recorder: started(116), aborted(104), completed(15), different header(1), trees trashed(0), slot promoted(0), unstable loop variable(0), breaks(0), returns(0), unstableInnerCalls(4), blacklisted(26)
monitor: triggered(27), exits(27), type mismatch(0), global mismatch(4)

Typical abort:

abort: 6918: fp->scopeChain is not global or active call object
Abort recording of tree http://nerget.com/fluidSim/pressure.js:290@137 at http://nerget.com/fluidSim/pressure.js:291@138: name.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

15 years ago

Attached file First JS file — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

15 years ago

Attached file Second js file — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

15 years ago

Attached file Testcase HTML — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

15 years ago

Oh, and a profile shows 55% of the time spent in (not under) js_Interpret.  Another 11% allocating doubles, another 15% on js_NativeGet, esp with Call objects.

That adds up to 80% or so.  17% is painting, and 3% is sort of here and there (2/3 of that is minor stuff under js_Interpret, the remaining 1/3 is event loop overhead, I think).

In any case, this is definitely in the right component, at least until we get this tracing...

Summary: Fluid dynamics simulators is slow → Fluid dynamics simulator is slow

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

Blocks: 467263

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

15 years ago

I also see a lot of:

Abort recording of tree https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392847:47@96 at https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392847:49@109: Inner tree is trying to grow, abort outer recording.

Line 49 is:

   var d = field.getDensity(x, y) * 255 / 5;

where |field| comes from this:

        displayFunc(new Field(dens, u, v));

and Field has a getDensity function.  I have no idea why that's causing us trace aborts; perhaps something akin to bug 497789?

Depends on: 497789

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

15 years ago

Or some sort of branding thing going on, perhaps?  Brendan?  The relevant guard seems to be :

    About to try emitting guard code for SideExit=0x17bb1ac exitType=BRANCH
    shape = ld map[32]
    #0x8756
    guard_kshape = eq shape, #0x8756
    xf1512: xf guard_kshape -> pc=0x160ded9 imacpc=0x0 sp+8 rp+0

The other branch exit I see some of is:

Abort recording of tree https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392846:62@120 at https://bug508716.bugzilla.mozilla.org/attachment.cgi?id=392846:64@138: Inner tree is trying to grow, abort outer recording.

Line 64 is:

  var lastRow = (j - 1) * rowSize;

Relevant guard:

    About to try emitting guard code for SideExit=0x17478e0 exitType=OVERFLOW
    eq134 = eq mul19, callDepth
    xt165: xt eq134 -> pc=0x16156d6 imacpc=0x0 sp+16 rp+0

Andreas Gal :gal

Comment 7

•

15 years ago

The 2nd guard looks like upvar access. Paging dmandelin.

David Mandelin [:dmandelin]

Comment 8

•

15 years ago

'Inner tree is trying to grow, abort outer recording.' is usually not a problem. In the normal case, we back off from tracing the outer loop for 32 iterations, by which time the inner tree hopefully has stabilized. We should be able to get the outer loop after one or a few rounds of that.

The abort in comment 0 looks like the returned closures abort.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

15 years ago

Yes, the data in comment 6 is with the returned closures patch applied (which makes the abort in comment 0 go away).

In this case, we don't stabilize as far as I can see; I'm seeing the trying to grow aborts all through the log, and it's certainly running long enough that the outer tree ends up blacklisted as a result of all the aborts.

David Mandelin [:dmandelin]

Comment 10

•

15 years ago

Hmmm. One thing I thought of is that we might generate so much code that we end up OOMing the code buffer and throwing it all away, but it sounds like you are saying that's not happening. I wonder if the inner loop code is so branchy that it can't stabilize in the number of backoffs we give. Based on:

    About to try emitting guard code for SideExit=0x17bb1ac exitType=BRANCH
    shape = ld map[32]
    #0x8756
    guard_kshape = eq shape, #0x8756
    xf1512: xf guard_kshape -> pc=0x160ded9 imacpc=0x0 sp+8 rp+0

This looks like a guard from a property access. If a zillion shapes go through, then we would try to grow a lot of branch exits.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 11

•

15 years ago

Yeah, that was my gut reaction too.  Hence wondering about bug 497789 or branding or some such...

Brendan Eich [:brendan]

Updated

•

15 years ago

Assignee: general → brendan

Boris Zbarsky [:bzbarsky]

Reporter

Comment 12

•

15 years ago

dvander, how does trace merging behave on this one?

Component: JavaScript Engine → jemalloc

Phil Ringnalda (:philor)

Updated

•

15 years ago

Component: jemalloc → JavaScript Engine

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

Component: JavaScript Engine → jemalloc

Depends on: 516264

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

Component: jemalloc → JavaScript Engine

Brendan Eich [:brendan]

Comment 13

•

15 years ago

The fix for bug 536564 seems to have helped quite a bit: with the latest tm nightly I get >30fps with black canvas and >26fps upon stirring up some green.

Boris, can you regen some stats? Or perhaps someone can make a shell testcase. wish (along with Oliver) we had TraceVis built in...

/be

Brendan Eich [:brendan]

Comment 14

•

15 years ago

(In reply to comment #11)
> Yeah, that was my gut reaction too.  Hence wondering about bug 497789

No .prototype in either .js file, so 497789 is not relevant.

> or branding or some such...

Indeed bug 536564 was all about unbranding to despecialize module pattern and power constructor (if that's what it is called) pattern code.

/be

Brendan Eich [:brendan]

Updated

•

15 years ago

Depends on: 536564
No longer depends on: 497789

Brendan Eich [:brendan]

Comment 15

•

15 years ago

I should probably give this bug to someone. Volunteers?

/be

OS: Mac OS X → All

Hardware: x86 → All

Boris Zbarsky [:bzbarsky]

Reporter

Comment 16

•

15 years ago

I don't see any impact from bug 536564's fix on this testcase in terms of fps.  I see about 22-30fps (depending on amount of green) both before and after that fix here.  I'll try to get some stats.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 17

•

15 years ago

I'm still seeing the abort from comment 5 on t-m tip.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 18

•

15 years ago

And in particular, it's not clear to me why bug 536564 would lead to less branding here, offhand...

Brendan Eich [:brendan]

Comment 19

•

15 years ago

(In reply to comment #18)
> And in particular, it's not clear to me why bug 536564 would lead to less
> branding here, offhand...

brendan: bz: we see this.foo = function... where the function is not a null closure, we unbrand this
brendan: first thing in the [Field] ctor


Comment 5 does not demonstrate shape regeneration per new Field, and I confirmed via dis('-r', FluidField) that Field unbrands |this|. Need to see what guards are exiting trace.

/be

Boris Zbarsky [:bzbarsky]

Reporter

Comment 20

•

15 years ago

OK.  So if I look at the trace exits, the vast majority are one of:

leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@124, op=call, lr=0x16a439c, exitType=BRANCH, sp=4, calldepth=0, cycles=0
leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@115, op=callprop, lr=0x1652224, exitType=BRANCH, sp=1, calldepth=0, cycles=0
leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@124, op=call, lr=0x16b3c04, exitType=BRANCH, sp=4, calldepth=0, cycles=0
leaving trace at http://nerget.com/fluidSim/pressure-display.js:49@115, op=callprop, lr=0x169c4a4, exitType=BRANCH, sp=1, calldepth=0, cycles=0

The relevant side exits seem to be:

    About to try emitting guard code for SideExit=0x16a439c exitType=BRANCH
...
    callee_obj->getPrivate() = int 357565760
    ld1 = ld $stack14[16]
    eq1 = eq ld1, callee_obj->getPrivate()
    xf1: xf eq1 -> pc=0x1637ae8 imacpc=0x0 sp+32 rp+0 (GuardID=001)
    About to try emitting guard code for SideExit=0x16a439c exitType=BRANCH
    OBJ_GET_PARENT(cx, callee_obj) = int 357591200
    ld2 = ld $stack14[12]
    eq2 = eq ld2, OBJ_GET_PARENT(cx, callee_obj)
    xf2: xf eq2 -> pc=0x1637ae8 imacpc=0x0 sp+32 rp+0 (GuardID=002)


    About to try emitting guard code for SideExit=0x1652224 exitType=BRANCH
    shape = ld map[4]
    37285 = int 37285
    guard_kshape = eq shape, 37285
    xf3: xf guard_kshape -> pc=0x1637adf imacpc=0x0 sp+8 rp+0 (GuardID=003)

0x16b3c04 is the function-identity and callee parent guard again, and 0x169c4a4 is another kshape guard.

This is all on t-m tip.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 21

•

15 years ago

When I say "vast majority", btw, of 135k total exits, about 91k are on the callee_obj guard and about 37k are on the kshape guard.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 22

•

15 years ago

OK.  The callee guard that fails is the Call object identity guard.  See TraceRecorder::guardCallee the second guard it puts in.

If I comment out that code block I see a bit of a win on the testase (maybe from 30fps to 36fps) and BRANCH exits more or less disappear from the log (283 exits out of a total of 10500).

Boris Zbarsky [:bzbarsky]

Reporter

Comment 23

•

15 years ago

With that code block commented out, profile looks like:

30% painting
58% under js_Interpret, almost all of it on trace.  None inside js_Interpret.
4% vm_fault stuff


Of the time under js_Interpret, about half is actually jit-generated code.  The rest is js_DoubleToInt32 (10% of overall time), js_Array_dense_setelem_double (10% of overall time, about half creating new gcobjects), 6% js_UnboxDouble.  Everything else is minor.

With the code block left in, however, time in js_Interpret is 5.5%.  Time under js_Interpret is 62%.  So dropping this guard and not exiting on it would give us something like 10% faster performance (which is about reflected in the "just slightly higher" fps).

To win more than that we need to at least stop all that double-to-int-and-back thrashing.  And speed up the painting part (need separate bugs).

Boris Zbarsky [:bzbarsky]

Reporter

Comment 24

•

15 years ago

OK.  That parent guard is bug 510554.

Depends on: 510554

Brendan Eich [:brendan]

Comment 25

•

14 years ago

Attached patch patch to test (obsolete) — Details — Splinter Review

This improves the all-or-nothing nature of flat closures by making an escaping function (funarg) flat if any of its upvars is invariant with initialization that dominates the closure. Any upvars not flattened must be accessed via Call objects but this wins (big, I think) over bailing to the all-upvars-need-Call-homes state.

I will clean up and incorporate any testing feedback before soliciting review.

/be

Boris Zbarsky [:bzbarsky]

Reporter

Comment 26

•

14 years ago

That patch may help a bit.  Hard to tell; would need to profile; not a high priority right this second.  But it should definitely go into a bug that blocks this one, not in this tracking bug.

Brendan Eich [:brendan]

Comment 27

•

14 years ago

(In reply to comment #26)
> That patch may help a bit.

With the parent guard elimination that depends on the changes in this patch, it should help a lot. But I need to add the elimination logic ;-)).

> ... it should definitely go into a bug that blocks
> this one, not in this tracking bug.

Will file, thanks for reminding me.

/be

Brendan Eich [:brendan]

Updated

•

14 years ago

Depends on: 542002

Brendan Eich [:brendan]

Comment 28

•

14 years ago

Comment on attachment 423311 [details] [diff] [review]
patch to test

See bug 542002.

/be

Attachment #423311 - Attachment is obsolete: true

Brendan Eich [:brendan]

Comment 29

•

14 years ago

dmandelin got 48fps today (on tm tip, I think -- JaegerMonkey landed there earlier today). Want to get up to high 50s like the competition. ;-)

/be

Boris Zbarsky [:bzbarsky]

Reporter

Comment 30

•

14 years ago

Over here, after the merge, I see 38fps on TM, and about 49-50 in Chrome and Safari.  Opera latest is at 54.

Doing a profile now.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 31

•

14 years ago

Sorry that took a bit; shark seems to need the frame pointer to produce useful calltree output, so I had to recompile first.

Looks like we're at about (on Mac; on Windows/Linux this might all be different because painting is so much of the profile):

55% painting (argb32_mark and argb32_image for the most part; mostly painting
    the canvas layer, but some painting of some thebes layer too, and a few
    percent in display list construction and such).
 5% Gecko event processing (I was mousing about and whatnot)
40% JS engine.

The JS engine time splits up as follows (all percentages that follow are percentages of JS engine time):

86.6% in jm-generated symbol-less code.
 5.2% under stubs::BitOr (half self time, half calling ValueToECMAInt32Slow)
   2% under PutImageData
 1.2% under stubs::SetElem (looks like for the image data)

Oh, all the above was without supervisor callstacks.  If I put those back I get the usual 3.5% of vm_fault and whatnot.

Upshot:  The only way to make this faster is to generate something different codewise on the JM end and/or make the painting faster.

Unfortunately, shark is not all that much use for the jit-generated stuff; it seems to have one entry per instruction or something, for the most part...  The most common ones are (all self times, obviously):

1.7% add byte ptr [eax], al
1.5% jmp 0x4c86e883
1.5% jmp 0x1886e987
1.4% jmp 0x7486e22f
1.4% jmp 0xb886ea4d
1.3% a 9.5KB chunk of code.  Looks like your typical code; lots of cmp, jmp,
     mov.
1.2% sldt dword ptr [eax]
1.1% sldt dword ptr [eax]
1.0% jmp 0xb086e4fa
0.7% mov dword ptr [eax], eax

Anyway, the upshot is that we have jmps, we presumably have context switches (the sldt), and that's about all I can tel from this tool.

Oliver Hunt

Comment 32

•

14 years ago

I suspect the bitor is '<expr> | 0' to convert a double to int, in which case simply exiting to the slowcase is probably very expensive relative to simply doing cvttsd2si (and associated checks)

Jan de Mooij [:jandem]

Comment 33

•

14 years ago

(In reply to comment #32)
> I suspect the bitor is '<expr> | 0' to convert a double to int, in which case
> simply exiting to the slowcase is probably very expensive relative to simply
> doing cvttsd2si (and associated checks)

Thanks, I will file a bug.

Jan de Mooij [:jandem]

Updated

•

14 years ago

Depends on: 592631

Boris Zbarsky [:bzbarsky]

Reporter

Comment 34

•

14 years ago

The patch in bug 592631 gets us to 42fps or so on my hardware (depending on how much I move the mouse; can go as low as 37 and as high as 44).

David Mandelin [:dmandelin]

Comment 35

•

14 years ago

I Sharked Safari on this and got 47% of time in JavaScriptCore. Using your 50fps measurement, that corresponds to a JS run time of 9.4ms/frame.

Our "old" (before the x|0 patch) score of 38 fps corresponds with 40% of time in JS gives a JS run time of 10.5ms/frame.

I estimate our JS run time to be 8.8-9.2ms/frame after the x|0 patch, depending on whether I (a) use the fact that we spent 5.2% there before and compute 10.5ms/frame*(1-5.2/40) or (b) use the new fps of 42 with an estimate new JS fraction of 37%.

Anyway, at this point it seems more likely that we can improve our score by making the non-JS parts faster. But we should still make a shell benchmark and do a more direct comparison of the JS times to see if there we are taking any direct losses there.

David Mandelin [:dmandelin]

Comment 36

•

14 years ago

Attached file Shell test case — Details

Here's a shell version. The timing loop is all the way at the bottom if you want to tweak it.

David Mandelin [:dmandelin]

Comment 37

•

14 years ago

Results from the shell benchmark, time taken to render each frame (in a run of 1000 frames), all 32-bit builds on OSX10.6:

  TM tip                6.8 ms
  TM tip + bug 592631   6.5 ms
  JSC                   7.0 ms

Jan de Mooij [:jandem]

Comment 38

•

14 years ago

One problem here is that x/y in JM and JSC does not convert the result to int32 if it's 0. V8 does division differently (they try idiv first), so they end up with ints everywhere which is obviously faster. I'm working on a patch to speedup JSOP_DIV by making our double-to-int logic work when the result is 0.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 39

•

14 years ago

The problem is that there are two different 0s, right?  So you have to be a little careful with that....

In any case, dmandelin's numbers suggest that painting is the place we really need to improve here at this point.

Jan de Mooij [:jandem]

Comment 40

•

14 years ago

(In reply to comment #39)
> The problem is that there are two different 0s, right?  So you have to be a
> little careful with that....
Yes..

> In any case, dmandelin's numbers suggest that painting is the place we really
> need to improve here at this point.
Sure. I don't know how much this jsop_div thing helps. dmandelin's numbers show that SM is fast enough here. Maybe my patch is not worth it and I will just drop it then.

Brendan Eich [:brendan]

Comment 41

•

14 years ago

Jan: don't give up on account of just one benchmark! What bug are you using for your idiv work?

/be

David Mandelin [:dmandelin]

Comment 42

•

14 years ago

Yeah, I wouldn't give up on idiv yet, either. Although the problem does seem to be more on the gfx side, your x|0 patch makes a very noticeable difference here. Any JS wins we can find are worthwhile.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 43

•

14 years ago

Right; I wasn't trying to discourage you.  The 0 thing has come up on other testcases too, so if we can fix it, great.  I just don't expect it to give us 10fps here.  ;)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 44

•

14 years ago

On my Mac, we spend 78% of the time in and under JS_CallFunctionValue on trunk. I guess that's just lack of JM. We spend 16% of the time (i.e. most of the rest) in BasicCanvasLayer::Paint just copying the canvas surface to the window, upscaled by by 4.

Now the interesting thing is that BasicCanvasLayer::Paint uses EXTEND_PAD --- which is correct, but for cairo-quartz this always means taking a path involving CGPatterns which is quite likely to be ... not as fast as it could be.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 45

•

14 years ago

> I guess that's just lack of JM. 

Yeah.

Is there a way to deal with the CGPattern/EXTEND_PAD thing?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

14 years ago

Yes. By treating it as EXTEND_NONE (which is completely reasonable, because as it turns out the current code is treating it as EXTEND_REPEAT, which is much more wrong), things get a little bit better. From 16% of my profile I get down to 13.5%. That's taken off about 15% of the paint time, actually more since I get more FPS now. Not great, but worth having. Filed bug 593270.

It's not clear that we can do much better than that using Quartz, though. Now we're just doing a bog-standard CGContextDrawImage, which does scaling, and there's no caching available since the image is different every time.

The good news is that we have some bigger guns in our graphics arsenal now. With D2D, D3D9 or GL enabled, the canvas compositing happens on the GPU. Indeed, in my Mac build if I set MOZ_ACCELERATED=11, painting goes from 13.5% to 2% of the profile.

So I would really like to know how we perform in a JM build with MOZ_ACCELERATED=11 and/or D2D enabled!

Depends on: 593270

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

14 years ago

Depends on: 593342

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

14 years ago

(In reply to comment #46)
> The good news is that we have some bigger guns in our graphics arsenal now.
> With D2D, D3D9 or GL enabled, the canvas compositing happens on the GPU.
> Indeed, in my Mac build if I set MOZ_ACCELERATED=11, painting goes from 13.5%
> to 2% of the profile.

Actually that was overselling the benefit, because on trunk GL-layers, the main thread ends up blocked waiting for the GPU for a while when we call glFlush at the end of each paint. In the profile I did, Shark hadn't sampled the main thread while it was blocked. Fixing that shows that GL layers isn't really making any difference to this benchmarks on Mac.

We shouldn't actually have to call glFlush and block though. If I remove it, performance is great, although unfortunately nothing actually renders on the screen. I filed bug 593342 to look into that.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 48

•

14 years ago

> So I would really like to know how we perform in a JM build with
> MOZ_ACCELERATED=11 and/or D2D enabled!

JM+TM + bug 592631, no moz_accel: 42fps, going down to 39 if I mouse
JM+TM + bug 592631, moz_accel: 55fps, going down to 53 if I mouse

This is without doing anything interesting with glFlush... ;)

"Let's ship it".

Boris Zbarsky [:bzbarsky]

Reporter

Comment 49

•

14 years ago

Note, btw, that CPU usage is about 60% in the latter test, and 97% in the former.  So we're definitely blocking on the flush; it's just _still_ faster than doing the composite ourselves.  At least on my hardware.

Just grinding out the numbers, without accel we take 24ms per frame, of which about half is painting.  With accel we take 18ms per frame.  So accel cut the painting time in half.  If not having to glFlush will further cut it by a factor of 3 (good approximation per comment 46), then we'll be at 14ms per frame or 71fps, with no changes to js engine.  At that point the idiv fix is likely to give a measurable improvement (since even a 2% change will be several fps).

Jan de Mooij [:jandem]

Comment 50

•

14 years ago

(In reply to comment #43)
> Right; I wasn't trying to discourage you.  
No problem.

> The 0 thing has come up on other
> testcases too, so if we can fix it, great.
Problem is that the most recent version no longer does division; it multiplies instead.. Do you remember one of these other benchmarks where this matters? 

Fortunately, my current WIP-patch is very small and hackish. I can polish it when I know it helps somewhere.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 51

•

14 years ago

> Do you remember one of these other benchmarks where this matters? 

Sorry, not offhand....  :(

Jeff Muizelaar [:jrmuizel]

Updated

•

14 years ago

Whiteboard: [painting-perf]

Brendan Eich [:brendan]

Comment 52

•

14 years ago

Jeff: this bug is meta, although against JS Engine (for want of a "fluidsim" or _sui generis_ component). Please file a painting perf bug and make it block this one. Thanks,

/be

Keywords: meta

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 53

•

14 years ago

We don't need a new bug, I filed a bug for the only clear painting issue here.

Brendan Eich [:brendan]

Comment 54

•

14 years ago

(In reply to comment #53)
> We don't need a new bug, I filed a bug for the only clear painting issue here.

Ok, then [painting-perf] goes there, not here.

/be

Whiteboard: [painting-perf]

Konstantin Svist

Comment 55

•

14 years ago

Is this still being worked on? I've been checking back on the nighly builds since post @http://blog.mozilla.com/dmandelin/2010/09/08/presenting-jagermonkey/
The primary demo works great (~45-47FPS), but when the alternate drawing mode is used, frame rate takes a dive (~4-15FPS).

P.S. MOZ_ACCELERATED=11 doesn't work here yet (Linux)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 56

•

14 years ago

I don't know that anyone's profiled the alternate drawing mode yet.

On Linux, if your drivers are good canvas is already accelerated via XRender.  OF course most drivers on Linux are not good....

Boris Zbarsky [:bzbarsky]

Reporter

Comment 57

•

14 years ago

On the other hand, I dunno whether Render accelerates the canvas upscale, which is the bottleneck here.  It might not.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 58

•

14 years ago

For what it's worth, on a Mac I see the primary demo at 65fps and the alternate drawing mode at 55fps in a current trunk build.  Chrome 9 dev on the same hardware is 55fps and 45fps respectively, for comparison.

So I'd guess your Linux issues are all graphics.

Konstantin Svist

Comment 59

•

14 years ago

Just retested
Fx (Dec 10 build, from http://nightly.mozilla.org/js-preview.html):
* primary mode: 44-47 FPS
* alternate mode: 3-15 FPS

You're right, it finally works with MOZ_ACCELERATED=11! Unfortunately, it works slower :(
* primary mode: 41-44 FPS
* alternate mode: 2-10 FPS


Chromium 8.0.560.0 (from http://repos.fedorapeople.org/repos/spot/chromium/fedora-14/):
* primary mode: 35 FPS (no variation while moving the mouse)
* alternate mode: 15-24 FPS (animation appears smooth, even at 15)

I also tried Chromium with --disable-accelerated-compositing (it's apparently enabled by default), but no major variation in speed

Konstantin Svist

Comment 60

•

14 years ago

So, should I file another bug? Or is it a lost cause?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 61

•

14 years ago

Probably worth filing one on the Linux graphics issue, yes.  Make sure to include information about your Xorg in the bug report!

Konstantin Svist

Comment 62

•

14 years ago

Added bug 620065

Jan de Mooij [:jandem]

Comment 63

•

13 years ago

This does very well with TI. 50+ fps in the browser with default resolution 64, trunk is somewhere between 40-45. With resolution 128 TI+JM gets 35+ fps, slightly faster than Chrome, trunk is at 25 fps.

For the shell test case:

-m -n    : 366 ms
d8       : 405 ms
-m -j -p : 512 ms 
-m       : 721 ms
-j       : 950 ms

Brendan Eich [:brendan]

Comment 64

•

13 years ago

Are we fast yet?

I'm going to throw this back into the pool. Jan, anyone: feel free to take and drive if there are short-term wins. TI is trying to land, so mark FIXED when you can -- no point keeping an unfixable bug around.

/be

Assignee: brendan → general

Jan de Mooij [:jandem]

Comment 65

•

13 years ago

(In reply to Brendan Eich [:brendan] from comment #64)
> TI is trying to land, so mark FIXED
> when you can -- no point keeping an unfixable bug around.

Just tested this on m-c: 60+ fps and a bit faster than Chrome. Nothing to do here.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Boris Zbarsky [:bzbarsky]

Reporter

Comment 66

•

13 years ago

But it's still only 6fps at "Resolution: 512".

Not sure it's worth worrying about, but if we can get _that_ fast, that would rock.  ;)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 67

•

13 years ago

Plus ours looks better than Chrome's because we use bilinear scaling to scale up the canvas.

First JS file 15 years ago Boris Zbarsky [:bzbarsky] 9.84 KB, text/plain		Details
Second js file 15 years ago Boris Zbarsky [:bzbarsky] 3.71 KB, text/plain		Details
Testcase HTML 15 years ago Boris Zbarsky [:bzbarsky] 6.62 KB, text/html		Details
patch to test 14 years ago Brendan Eich [:brendan] 24.35 KB, patch		Details \| Diff \| Splinter Review
Shell test case 14 years ago David Mandelin [:dmandelin] 19.33 KB, text/plain		Details