Skip to content
This repository was archived by the owner on Mar 13, 2023. It is now read-only.

Create Pcluster Manager Fails #366

Open
sean-smith opened this issue Nov 14, 2022 · 7 comments · Fixed by #421
Open

Create Pcluster Manager Fails #366

sean-smith opened this issue Nov 14, 2022 · 7 comments · Fixed by #421
Labels
bug Something isn't working

Comments

@sean-smith
Copy link
Contributor

Hi,

Using the following cluster template breaks the create interface after the Storage Tab with:

image

From the Chrome console I see:

framework-bb5c596eafb42b22.js:1 TypeError: Cannot read properties of undefined (reading 'length')
    at index-e0a78d528f2085ff.js:1:170318
    at Array.filter (<anonymous>)
    at index-e0a78d528f2085ff.js:1:170289
    at _a (index-e0a78d528f2085ff.js:1:170339)
    at oo (framework-bb5c596eafb42b22.js:1:59416)
    at Ku (framework-bb5c596eafb42b22.js:1:111716)
    at Li (framework-bb5c596eafb42b22.js:1:98957)
    at Ni (framework-bb5c596eafb42b22.js:1:98885)
    at Pi (framework-bb5c596eafb42b22.js:1:98748)
    at bi (framework-bb5c596eafb42b22.js:1:95714)
cu @ framework-bb5c596eafb42b22.js:1
main-1ae0bdeb4d020668.js:1 TypeError: Cannot read properties of undefined (reading 'length')
    at index-e0a78d528f2085ff.js:1:170318
    at Array.filter (<anonymous>)
    at index-e0a78d528f2085ff.js:1:170289
    at _a (index-e0a78d528f2085ff.js:1:170339)
    at oo (framework-bb5c596eafb42b22.js:1:59416)
    at Ku (framework-bb5c596eafb42b22.js:1:111716)
    at Li (framework-bb5c596eafb42b22.js:1:98957)
    at Ni (framework-bb5c596eafb42b22.js:1:98885)
    at Pi (framework-bb5c596eafb42b22.js:1:98748)
    at bi (framework-bb5c596eafb42b22.js:1:95714)
ee @ main-1ae0bdeb4d020668.js:1
main-1ae0bdeb4d020668.js:1 A client-side exception has occurred, see here for more info: https://nextjs.org/docs/messages/client-side-exception-occurred

Here's the template that broke it, with some params changed:

Region: us-east-2
Image:
  Os: alinux2
HeadNode:
  InstanceType: c6i.2xlarge
  Networking:
    SubnetId: subnet-123456789
  Ssh:
    KeyName: keypair
  DisableSimultaneousMultithreading: true
  LocalStorage:
    RootVolume:
      Size: 100
      VolumeType: gp3
  CustomActions:
    OnNodeConfigured:
      Script: >-
        https://bucket.us-east-2.amazonaws.com/headnode_install.sh
      Args:
        - HEAD
  Iam:
    AdditionalIamPolicies:
      - Policy: arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
  Dcv:
    Enabled: true
Scheduling:
  Scheduler: slurm
  SlurmQueues:
    - Name: compute
      CapacityType: ONDEMAND
      ComputeResources:
        - Name: compute-hpc6a
          Efa:
            Enabled: true
            GdrSupport: true
          InstanceType: hpc6a.48xlarge
          MinCount: 0
          MaxCount: 101
          DisableSimultaneousMultithreading: true
      Networking:
        SubnetIds:
          - subnet-123456789
        PlacementGroup:
          Enabled: true
      CustomActions:
        OnNodeConfigured:
          Script: >-
            https://bucket.s3.us-east-2.amazonaws.com/compute_install.sh
      ComputeSettings:
        LocalStorage:
          RootVolume:
            VolumeType: gp3
SharedStorage:
  - MountDir: /opt/ncar
    Name: ncar
    StorageType: Ebs
    EbsSettings:
      Size: '35'
      VolumeType: gp3
      DeletionPolicy: Delete
  - MountDir: /scratch
    Name: scratch
    StorageType: FsxLustre
    FsxLustreSettings:
      FileSystemId: fs-0e709f43fbde2c3a2
@mendaomn
Copy link
Contributor

Thank you for reaching out. This a known issue that's related to the fact you are using a template created with 3.2.0 on PCM 3.3.0

There is no fix available at the moment.

As a workaround, you can edit the template by replacing Scheduling > SlurmQueues > ComputeResources > InstanceType and using the new Instances property introduced with PC 3.3.0

@mendaomn mendaomn added bug Something isn't working wontfix This will not be worked on labels Nov 15, 2022
@mendaomn mendaomn closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2022
@mcb-silverlining
Copy link

Is there a web page that lists issues like this? These issues are not “known issues” to the customer. Perhaps a web page of known bugs would help. As a customer, this caused me some churn. Particularly when the issue is the result of a lack of backwards config file compatibility for a relatively minor update (3.2->3.3), it would be helpful to let your customers know.

@sean-smith sean-smith reopened this Nov 15, 2022
@mendaomn
Copy link
Contributor

@Silver-Linda

Thank you for voicing your concerns, we are working hard on adding a comprehensive documentation for the product but unfortunately we still are not there.

We could have pointed this out in our changelog as a breaking change, and will adopt this strategy in the future, so that customers are informed on what is happening and how to work around it

Regarding this specific issue, we are working on a fix, since we realize the app is not supposed to crash due to lack of backwards compatibility (you can follow this issue to be notified when the fix lands).

However, at the moment PCM does not support creating a cluster starting from a template created with previous versions. We are actively working on figuring out the best approach to take from here on out

@mcb-silverlining
Copy link

mcb-silverlining commented Nov 16, 2022

@mendaomn

Thanks for the transparent reply. Pcluster manager is an extremely important aspect of parallelcluster usability but a lack of backwards compatibility (a feature pcluster manager used to have) makes it very difficult to maintain clusters while updating to any new parallelcluster versions. Not only does this impact my hopes of version control of the yaml config file, it is hard to create a yaml config file from scratch, even with pcluster manager. The parallelcluster move to a complicated yaml config file necessitated some sort of tool that helped create that file. Backwards compatibility seems critical to me.

@sean-smith
Copy link
Contributor Author

Re-opening as current release still has this issue.

@sean-smith sean-smith reopened this Jan 10, 2023
@natalie-white-aws
Copy link

natalie-white-aws commented Jan 10, 2023

Definitely need to update the template used for this workshop to include the workaround now that the pcluster manager template defaults to 3.3.0

Also need to update the CLI instructions to install pcluster 3.3.0 otherwise the CLI-created cluster (which currently will use 3.4.0 by default as the latest version) you can't go back and view the CLI-created cluster using the 3.3.0 version pcluster manager. You'll get an error "Cluster hpc belongs to an incompatible ParallelCluster major version"

@sean-smith
Copy link
Contributor Author

@natalie-white-aws my PR updates the template to resolve this issue and we'll release 3.4.0 with ParallelCluster Manager shortly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants