Add MAUI Android inner loop deploy measurement scenario#5165
Add MAUI Android inner loop deploy measurement scenario#5165davidnguyen-tech wants to merge 19 commits intodotnet:mainfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
When this PR goes in, #5176, you will need to take the changes. I am changing how the auth works and it needs the update to |
0418d92 to
54f7a75
Compare
Add a new C# parser that extracts build and deploy metrics from MSBuild binary logs (.binlog) for MAUI Android inner loop measurements. The parser captures: - Overall build duration (Publish Time) - Build task timings: Csc, XamlC, GenerateJavaStubs, D8, Javac, etc. - Build target timings: CoreCompile, _GenerateJavaStubs, _CompileToDalvik, etc. - Deploy task timings: FastDeploy, AndroidSignPackage, Aapt2Link - Deploy target timings: _Sign, _Upload, _DeployApk, _BuildApkFastDev Register the AndroidInnerLoop MetricType in Startup.cs so the parser can be selected via the measurement framework. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Extend the shared test runner with a new ANDROIDINNERLOOP scenario type that orchestrates first deploy and incremental build+deploy+startup measurements for MAUI Android apps. Changes: - const.py: Add ANDROIDINNERLOOP constant and scenario name mapping - runner.py: Add argument parser and full ANDROIDINNERLOOP handler that performs first build+deploy, then N incremental iterations with source file toggling, binlog capture, startup time measurement via am start, and result upload to perflab - androidhelper.py: Add skip_install, screen_timeout_ms, skip_uninstall, and other parameters to AndroidHelper for inner loop reuse - startup.py: Fix copytree FileExistsError with dirs_exist_ok=True Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add the scenario-specific scripts for MAUI Android inner loop measurements: - pre.py: Bootstraps the .NET SDK, installs maui-android workload, restores NuGet packages, installs Android SDK dependencies (build tools, platform SDK, Java), and creates the MAUI test app with modified source files for incremental measurement - setup_helix.py: Helix-specific environment setup that discovers dotnet/SDK/Android/Java paths, installs workloads, and prepares the build environment on Helix agents - test.py: Entry point that invokes the shared test runner with ANDROIDINNERLOOP test type - post.py: Cleanup script that disables device animations, restores screen settings, and uninstalls the test APK using ADB Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add CI infrastructure to run inner loop measurements on Helix: - maui_scenarios_android_innerloop.proj: MSBuild project that defines Helix work items with per-platform PreCommands for environment setup, SDK discovery, workload installation, and test invocation - sdk-perf-jobs.yml: Add 6 inner loop job definitions covering Pixel 8, Galaxy A16, and Android 36 emulator queues, each with Mono and CoreCLR runtime configurations - build-machine-matrix.yml: Add ubuntu-x64-android-emulator build machine mapping to Ubuntu.2204.Amd64.Android.36 queue - run-performance-job.yml: Support androidinnerloop runtime flavor - run_performance_job.py: Extend maui_scenarios_android run_kind matching to include innerloop variant; copy binlogs to artifacts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
73918d8 to
ff881bb
Compare
Add --screen-timeout-ms CLI argument (default 1800000 = 30 min) and ScreenTimeoutMs MSBuild property so the screen timeout can be tuned from the .proj file without code changes. Add LaunchState validation to AndroidHelper.measure_cold_startup(): if am start reports anything other than COLD (e.g. UNKNOWN when the screen is off), the method now throws with a clear diagnostic message suggesting to increase --screen-timeout-ms. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
ff881bb to
746709a
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new Helix-driven scenario to measure MAUI Android “inner loop” performance by timing first build+deploy+startup and repeated incremental build+deploy+startup iterations, with binlog parsing for build/deploy breakdown and am start/logcat parsing for startup time.
Changes:
- Introduces a new
AndroidInnerLoopparser in the ScenarioMeasurement startup tool to extract target/task durations from.binlogs. - Adds a new
androidinnerloopscenario flow in the Python runner, plus a dedicatedmauiandroidinnerloopscenario directory (pre/setup/post scripts) and a new Helix project file. - Updates pipelines/job config to run the scenario on Pixel/Galaxy (Windows) and an Android emulator (Ubuntu), and fixes repeated trace directory uploads by allowing
copytreeinto existing dirs.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tools/ScenarioMeasurement/Util/Parsers/AndroidInnerLoopParser.cs | New binlog parser emitting build/deploy counters for the new scenario. |
| src/tools/ScenarioMeasurement/Startup/Startup.cs | Adds AndroidInnerLoop metric type wiring to the startup tool. |
| src/scenarios/shared/startup.py | Allows repeated trace uploads by using copytree(..., dirs_exist_ok=True). |
| src/scenarios/shared/runner.py | Adds androidinnerloop subcommand and orchestrates build+deploy+startup iterations + aggregation/upload. |
| src/scenarios/shared/const.py | Adds ANDROIDINNERLOOP constant and scenario-name mapping. |
| src/scenarios/shared/androidhelper.py | Extends device setup options and adds cold-start measurement helper. |
| src/scenarios/mauiandroidinnerloop/test.py | New scenario entrypoint using shared runner. |
| src/scenarios/mauiandroidinnerloop/setup_helix.py | New Helix setup script (workloads, Android deps, adb readiness, restore). |
| src/scenarios/mauiandroidinnerloop/pre.py | Creates MAUI template payload + prepares file-edit toggles and NuGet config. |
| src/scenarios/mauiandroidinnerloop/post.py | Cleanup/uninstall/build-server shutdown for the new scenario. |
| scripts/run_performance_job.py | Adds new run_kind handling and binlog artifact copying for this scenario. |
| eng/pipelines/templates/run-performance-job.yml | Enables passing --runtime-flavor for the new run kind. |
| eng/pipelines/templates/build-machine-matrix.yml | Adds Ubuntu Android emulator queue to the build matrix (private builds). |
| eng/pipelines/sdk-perf-jobs.yml | Schedules the new scenario across Pixel/Galaxy/emulator for mono+coreclr Debug. |
| eng/performance/maui_scenarios_android_innerloop.proj | New Helix project defining payload prep + work items for device/emulator tracks. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
LoopedBard3
left a comment
There was a problem hiding this comment.
Looks good to me and it seems the test run had only failed due to old upload/env setup. Just one small follow up comment. Let me know and I will merge this 👍.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ef42ad6 to
14db0bb
Compare
Inner loop measurement is intentionally Debug-only. Add a ValidateInnerLoopBuildConfig target that fails the build with a clear error if BuildConfig != Debug, so a future flip in sdk-perf-jobs.yml can't silently produce numbers that look like they came from a Debug run. Switch the work-item Command to '-c $(BuildConfig)' to match the pattern in maui_scenarios_android.proj and maui_scenarios_ios.proj. The Error target is the gate; the property is now the single source of truth for the configuration the work item runs against. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
14db0bb to
27ef49a
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…colon-list args
Drop skip_package_verifier=True from setup_device. Every other Android
benchmark calls setup_device with the default (False), which disables the
device-global package verifier and was added specifically to make startup
testing less noisy. Our True flag was inherited from the pre-AndroidHelper
local setup_measurement_device() and had no scenario-specific reason.
Removing it brings inner-loop in line with the rest of the Android
startup suite and avoids verifier overhead on each per-iteration reinstall.
Update --edit-src / --edit-dest help strings to document the semicolon-
separated convention and the positional pairing (impl uses
args.editsrc.split(';') and pairs by index).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reverting the skip_package_verifier kwarg change from a807f347 after re-checking the history. The True flag was added deliberately in 26c3f746 (the AndroidHelper migration) to preserve the existing device-mutation footprint of this scenario: the prior local setup_measurement_device() function only touched screen-wake, animations, and screen_off_timeout, and never mutated verifier_verify_adb_installs or package_verifier_enable. The green baseline run was produced under those conditions and silently flipping verifier state on the Helix device for this scenario would change measurement conditions without a deliberate decision. The inner-loop scenario reinstalls the APK on every iteration, so a verifier-disable change here would compound across samples in a way that the cold-start scenarios (which install once and measure startup of the already-installed app) don't have to deal with. Worth a separate, intentional change with its own baseline if we want to make that switch later. The --edit-src / --edit-dest help text update from a807f347 is kept. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Scheduled a new build with the latest changes: |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
The ValidateInnerLoopBuildConfig target added in 8ffbbf4a is firing because $(BuildConfig) isn't what it looks like by the time the Test target runs. In the helix flow, send_to_helix.py seems to overwrite the BuildConfig env var with a long composite identifier used for log-directory naming (e.g. x64_main_maui_scenarios_android_innerloop). So the proj sees that string instead of Debug and the error fires. I think sticking to what was working before and not having extra validation is fine in this case. If we wanted a guard against a future flip, it needs to use a property that doesn't get clobbered (e.g., a new pipeline variable mirrored from ${{ parameters.buildConfig }} in run-performance-job.yml), but I don't think that is worth it for this at the moment. |
Reverts the proj-side changes from 8ffbbf4a after build 2967606 failed. Per @LoopedBard3's investigation: in the Helix flow, send_to_helix.py overwrites the BuildConfig env var with a long composite identifier used for log-directory naming (e.g. x64_main_maui_scenarios_android_innerloop), so by the time the Test target runs $(BuildConfig) is no longer 'Debug' and the validation target fires. The same property is then used for the work-item Command's -c argument, which would also pass garbage to the inner runner. Restoring the literal '-c Debug' that was working in build 2964110 and removing the ValidateInnerLoopBuildConfig target. A future guard would need a property that doesn't get clobbered by send_to_helix.py (e.g., a new pipeline variable mirrored from ${{ parameters.buildConfig }} in run-performance-job.yml) — out of scope for this PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kotlarmilos
left a comment
There was a problem hiding this comment.
Overall looks good. Reusing the existing constructs should reduce the amount of code and make it easier for review and maintenance
| @@ -0,0 +1,102 @@ | |||
| <Project Sdk="Microsoft.DotNet.Helix.Sdk" DefaultTargets="Test"> | |||
There was a problem hiding this comment.
This file is redundant with the existing eng/performance/maui_scenarios_android.proj. Could we just add the new HelixWorkItem to the maui_scenarios_android.proj and condition on the inner dev loop?
There was a problem hiding this comment.
Yes, we could - I had a prototype for that approach and from ultra-high level, but the Inner Loop scenario seemed too different from the other ones and it looked approximately like this:
- if inner_loop:
- ...
- else:
- ...
Don't have a strong opinion on this, happy to switch to using just one .proj file :)
| var aapt2LinkTimes = new List<double>(); | ||
|
|
||
| // Build targets | ||
| var coreCompileTargetTimes = new List<double>(); |
There was a problem hiding this comment.
What is difference between build tasks and build targets? Do we measure msbuild and msbuild targets separately?
There was a problem hiding this comment.
MSBuild Tasks are the individual steps of Targets. I chose to track both:
- Target timings enable us to track the main bottleneck areas
- Task timings give us granularity for more precise investigation
| public void EnableKernelProvider(ITraceSession kernel) { throw new NotImplementedException(); } | ||
| public void EnableUserProviders(ITraceSession user) { throw new NotImplementedException(); } | ||
|
|
||
| public IEnumerable<Counter> Parse(string binlogFile, string processName, IList<int> pids, string commandLine) |
There was a problem hiding this comment.
How do we plan to distinguish cold vs hot startup?
There was a problem hiding this comment.
With my current understanding of Inner Loop = build --> deploy --> startup, I chose to always measure the cold startup. Just to be sure - @jonathanpeppers, is this correct?
Anyway - I think we can distinguish cold vs hot startup based on the output of am. Currently, we use that throw on non-cold states.
There was a problem hiding this comment.
As long as it is the "second run", we deployed a change and run the app again. That seems like what should be measured.
Cold vs hot is if the app has been launched more than once, but usually no code changes?
There was a problem hiding this comment.
Cold vs hot is if the app has been launched more than once, but usually no code changes?
Yes that's what I expected. I'm not aware how HotReload affects this scenario when it comes to Hot startups, if at all?
Summary
Adds a new scenario that measures MAUI Android developer inner loop performance:
.binglogs.The scenario is the first one in the repo to build the app on Helix.
We send the app source code as the Helix payload because of the nature of this scenario.
The incremental change is simulated by:
Sequential incremental inner loop measurements are implemented by switching between the updated and the original versions of the files.
Test targets
Windows.11.Amd64.Pixel.PerfWindows.11.Amd64.Galaxy.Lowend.PerfUbuntu.2204.Amd64.Android.36AzDO Build
https://dev.azure.com/dnceng/internal/_build/results?buildId=2964307&view=results