Skip to content

Instantly share code, notes, and snippets.

@zonca
Last active September 30, 2025 23:02
Show Gist options
  • Select an option

  • Save zonca/4af9e464a29f031ea2657c086afc036e to your computer and use it in GitHub Desktop.

Select an option

Save zonca/4af9e464a29f031ea2657c086afc036e to your computer and use it in GitHub Desktop.
[
{
"text": "Hi everybody. I'm Mary Thomas and",
"start": 0.8,
"duration": 2.71
},
{
"text": "I lead and coordinate a lot of the HPC",
"start": 3.52,
"duration": 2.72
},
{
"text": "training events and I'm really pleased",
"start": 5.84,
"duration": 2.32
},
{
"text": "to welcome you all to our advanced",
"start": 8.32,
"duration": 2.48
},
{
"text": "computing webinar series or our events",
"start": 11.2,
"duration": 2.88
},
{
"text": "from HPC. I want to introduce you to",
"start": 14.08,
"duration": 2.88
},
{
"text": "Andrea Zonca who has a background in",
"start": 17.04,
"duration": 2.96
},
{
"text": "cosmology. He earned his PhD analyzing",
"start": 19.44,
"duration": 2.4
},
{
"text": "cosmic microwave background data from",
"start": 22.48,
"duration": 3.04
},
{
"text": "the Planck satellite. He also leads our",
"start": 25.44,
"duration": 2.96
},
{
"text": "SDSC scientific computing applications",
"start": 28.64,
"duration": 3.2
},
{
"text": "group which focuses on helping research",
"start": 31.12,
"duration": 2.48
},
{
"text": "groups in any field of science to",
"start": 33.52,
"duration": 2.4
},
{
"text": "support benchmark and optimize their",
"start": 35.36,
"duration": 1.84
},
{
"text": "data pipelines to national",
"start": 37.68,
"duration": 2.32
},
{
"text": "supercomputing. Andrea has expertise",
"start": 40.64,
"duration": 2.96
},
{
"text": "in parallel computing, Python, C++",
"start": 43.68,
"duration": 3.04
},
{
"text": "distributed computing with Python data",
"start": 46.48,
"duration": 2.8
},
{
"text": "intensive computing, Kubernetes cluster",
"start": 49.04,
"duration": 2.56
},
{
"text": "administration and cosmic microwave",
"start": 51.84,
"duration": 2.8
},
{
"text": "backgrounds. So with that I'll turn",
"start": 54.32,
"duration": 2.48
},
{
"text": "this over and I want to thank Andrea for",
"start": 56.88,
"duration": 2.56
},
{
"text": "doing this presentation. I think you'll",
"start": 58.88,
"duration": 2.0
},
{
"text": "find it very in-depth and detailed",
"start": 61.04,
"duration": 2.16
},
{
"text": "and there will be a link given for",
"start": 63.68,
"duration": 2.64
},
{
"text": "the talk to a GitHub repository that",
"start": 66.08,
"duration": 2.4
},
{
"text": "you'll be able to access and work",
"start": 68.96,
"duration": 2.88
},
{
"text": "with later on. With that I'll turn it",
"start": 70.96,
"duration": 2.0
},
{
"text": "over to you Andrea.",
"start": 72.88,
"duration": 1.92
},
{
"text": "Thank you Mary. So I'm going to",
"start": 75.04,
"duration": 2.16
},
{
"text": "start",
"start": 78.24,
"duration": 3.2
},
{
"text": "screen sharing.",
"start": 80.96,
"duration": 2.72
},
{
"text": "Can someone confirm the screen",
"start": 87.92,
"duration": 6.96
},
{
"text": "sharing is working properly?",
"start": 91.12,
"duration": 3.2
},
{
"text": "Confirming.",
"start": 92.96,
"duration": 1.84
},
{
"text": "Very good. So let's start. So",
"start": 94.56,
"duration": 1.6
},
{
"text": "in this tutorial I'll be showing",
"start": 97.84,
"duration": 3.28
},
{
"text": "you some important aspects of",
"start": 102.56,
"duration": 4.72
},
{
"text": "using Python in a High Performance",
"start": 105.36,
"duration": 2.8
},
{
"text": "Computing environment on a supercomput.",
"start": 108.48,
"duration": 3.12
},
{
"text": "I will be running on",
"start": 110.88,
"duration": 2.4
},
{
"text": "Expanse which is our local supercomputer",
"start": 115.84,
"duration": 4.96
},
{
"text": "here at San Diego supercomputer center.",
"start": 118.8,
"duration": 2.96
},
{
"text": "However, this is very portable. This",
"start": 121.36,
"duration": 2.56
},
{
"text": "is all Python packages that run very",
"start": 124.96,
"duration": 3.6
},
{
"text": "effectively also on other",
"start": 129.76,
"duration": 4.8
},
{
"text": "supercomputers. So you can use very",
"start": 131.84,
"duration": 2.08
},
{
"text": "similar techniques to run on any",
"start": 135.44,
"duration": 3.6
},
{
"text": "other local or national supercomput",
"start": 138.72,
"duration": 3.28
},
{
"text": "that you have access to and in",
"start": 141.52,
"duration": 2.8
},
{
"text": "particular if you are running on expanse",
"start": 145.12,
"duration": 3.6
},
{
"text": "I am using Galileo Galileo is this",
"start": 148.4,
"duration": 3.28
},
{
"text": "feature this is this is a package that",
"start": 153.92,
"duration": 5.52
},
{
"text": "allows to basically execute a Jupyter",
"start": 156.72,
"duration": 2.8
},
{
"text": "lab environment in a computing known. So",
"start": 160.4,
"duration": 3.68
},
{
"text": "every time of course that we are running",
"start": 163.2,
"duration": 2.8
},
{
"text": "on HPC that we are running we are using",
"start": 165.36,
"duration": 2.16
},
{
"text": "a significant amount of data and",
"start": 168.8,
"duration": 3.44
},
{
"text": "computing we want to run on computing",
"start": 170.96,
"duration": 2.16
},
{
"text": "nodes. So first thing that you need",
"start": 173.76,
"duration": 2.8
},
{
"text": "to understand",
"start": 176.56,
"duration": 2.8
},
{
"text": "is look at the documentation of the",
"start": 178.48,
"duration": 1.92
},
{
"text": "supercomputer that you're using and",
"start": 180.72,
"duration": 2.24
},
{
"text": "understand what's the recommended",
"start": 182.72,
"duration": 2.0
},
{
"text": "technique for running a Jupyter",
"start": 185.2,
"duration": 2.48
},
{
"text": "notebook. And Jupyter notebook is",
"start": 187.6,
"duration": 2.4
},
{
"text": "the best environment at the beginning",
"start": 191.52,
"duration": 3.92
},
{
"text": "for exploring your data doing some tests",
"start": 194.16,
"duration": 2.64
},
{
"text": "runs and develop pieces of your",
"start": 197.84,
"duration": 3.68
},
{
"text": "software and then once you have a recipe",
"start": 200.8,
"duration": 2.96
},
{
"text": "that you built in your notebooks then",
"start": 204.48,
"duration": 3.68
},
{
"text": "you can build some Python scripts and do",
"start": 206.48,
"duration": 2.0
},
{
"text": "more large scale computations. But",
"start": 209.52,
"duration": 3.04
},
{
"text": "generally the best way to start using",
"start": 211.68,
"duration": 2.16
},
{
"text": "Python even on HPC is through the",
"start": 215.28,
"duration": 3.6
},
{
"text": "Jupyter notebook environment. So",
"start": 218.0,
"duration": 2.72
},
{
"text": "so in this case so this is Expanse",
"start": 221.6,
"duration": 3.6
},
{
"text": "specific. So I don't want to spend too",
"start": 224.0,
"duration": 2.4
},
{
"text": "much time on on this. So this is",
"start": 226.4,
"duration": 2.4
},
{
"text": "basically a script that you connect to",
"start": 230.32,
"duration": 3.92
},
{
"text": "Expanse. You run this on the computing",
"start": 233.92,
"duration": 3.6
},
{
"text": "node and this is going to give you a",
"start": 236.24,
"duration": 2.32
},
{
"text": "link. You click on that link and you get",
"start": 237.76,
"duration": 1.52
},
{
"text": "the access to a JupyterLab environment",
"start": 240.4,
"duration": 2.64
},
{
"text": "running on a computing node. Now",
"start": 243.84,
"duration": 3.44
},
{
"text": "once you have your Jupyter notebook",
"start": 247.84,
"duration": 4.0
},
{
"text": "environment running, second thing is how",
"start": 250.24,
"duration": 2.4
},
{
"text": "do you handle the Python environments.",
"start": 253.92,
"duration": 3.68
},
{
"text": "Again this is different for",
"start": 257.84,
"duration": 3.92
},
{
"text": "each supercomputer. So please look at",
"start": 261.84,
"duration": 4.0
},
{
"text": "the documentation for expanse. I am I",
"start": 265.04,
"duration": 3.2
},
{
"text": "am recommending two techniques.",
"start": 270.96,
"duration": 5.92
},
{
"text": "And so you see all the files here on",
"start": 274.48,
"duration": 3.52
},
{
"text": "the left. This is the content of the",
"start": 278.0,
"duration": 3.52
},
{
"text": "GitHub repository that Cindy pasted",
"start": 281.36,
"duration": 3.36
},
{
"text": "in the chat before. So you have access",
"start": 285.84,
"duration": 4.48
},
{
"text": "to all of these files and you can try",
"start": 289.44,
"duration": 3.6
},
{
"text": "to run them yourself if you have a",
"start": 292.72,
"duration": 3.28
},
{
"text": "computing if you have an account on on",
"start": 295.84,
"duration": 3.12
},
{
"text": "Expanse or you can run on other",
"start": 298.72,
"duration": 2.88
},
{
"text": "machines. So the first technique that",
"start": 300.96,
"duration": 2.24
},
{
"text": "i'm showing is so the main",
"start": 303.92,
"duration": 2.96
},
{
"text": "concern about using Python on high",
"start": 309.12,
"duration": 5.2
},
{
"text": "performance computing is that a Python",
"start": 311.04,
"duration": 1.92
},
{
"text": "installation is composed of thousands",
"start": 313.2,
"duration": 2.16
},
{
"text": "of tiny files. And this can cause",
"start": 318.0,
"duration": 4.8
},
{
"text": "trouble especially on the parallel file",
"start": 321.68,
"duration": 3.68
},
{
"text": "systems which are very like luster which",
"start": 323.76,
"duration": 2.08
},
{
"text": "is very common on supercomputer",
"start": 326.32,
"duration": 2.56
},
{
"text": "environment because they are optimized",
"start": 328.8,
"duration": 2.48
},
{
"text": "for parallel computing and they work",
"start": 330.64,
"duration": 1.84
},
{
"text": "very badly with Python environments. And",
"start": 333.68,
"duration": 3.04
},
{
"text": "so there are some techniques to go",
"start": 336.08,
"duration": 2.4
},
{
"text": "around that and the one the recommended",
"start": 338.08,
"duration": 2.0
},
{
"text": "technique on expanse is to stage a cond",
"start": 341.84,
"duration": 3.76
},
{
"text": "environment.",
"start": 346.88,
"duration": 5.04
},
{
"text": "So is to package your comb",
"start": 348.48,
"duration": 1.6
},
{
"text": "environment into a single targz file",
"start": 350.72,
"duration": 2.24
},
{
"text": "into a single package and then",
"start": 355.2,
"duration": 4.48
},
{
"text": "stage that package on the scratch",
"start": 357.76,
"duration": 2.56
},
{
"text": "space which is very fast file system",
"start": 361.92,
"duration": 4.16
},
{
"text": "because is has solid state drives",
"start": 364.72,
"duration": 2.8
},
{
"text": "is on solid state drive so it's",
"start": 369.76,
"duration": 5.04
},
{
"text": "extremely fast so you can look at the",
"start": 371.52,
"duration": 1.76
},
{
"text": "document at the detailed",
"start": 374.8,
"duration": 3.28
},
{
"text": "documentation about this but basically",
"start": 377.28,
"duration": 2.48
},
{
"text": "what's doing is whenever you want to run",
"start": 379.76,
"duration": 2.48
},
{
"text": "something so you want to run some Python",
"start": 382.8,
"duration": 3.04
},
{
"text": "distributed computing so in this case we",
"start": 385.68,
"duration": 2.88
},
{
"text": "have some sample tech t test code",
"start": 388.48,
"duration": 2.8
},
{
"text": "which is node_info.py which is just",
"start": 393.6,
"duration": 5.12
},
{
"text": "simply printing out some node",
"start": 396.0,
"duration": 2.4
},
{
"text": "information and we want to execute this",
"start": 398.0,
"duration": 2.0
},
{
"text": "on multiple nodes. So you see we are",
"start": 400.88,
"duration": 2.88
},
{
"text": "requesting three nodes and we want to",
"start": 403.76,
"duration": 2.88
},
{
"text": "run this in parallel at the same time on",
"start": 407.12,
"duration": 3.36
},
{
"text": "three nodes. But first thing we need to",
"start": 409.36,
"duration": 2.24
},
{
"text": "stage the conda environment and so",
"start": 413.28,
"duration": 3.92
},
{
"text": "this is what we do. We first stage the",
"start": 416.8,
"duration": 3.52
},
{
"text": "conda environment and then we ex execute",
"start": 419.44,
"duration": 2.64
},
{
"text": "the script. And this way each of",
"start": 423.28,
"duration": 3.84
},
{
"text": "the three nodes is going to unpack the",
"start": 426.96,
"duration": 3.68
},
{
"text": "Python environment which is a clone",
"start": 430.24,
"duration": 3.28
},
{
"text": "environment which was prepared before",
"start": 432.08,
"duration": 1.84
},
{
"text": "unpack it on on the scratch space and",
"start": 436.0,
"duration": 3.92
},
{
"text": "then run starting from there. So that is",
"start": 439.28,
"duration": 3.28
},
{
"text": "run Python can run very efficiently.",
"start": 442.08,
"duration": 2.8
},
{
"text": "And our Galileo example our",
"start": 444.8,
"duration": 2.72
},
{
"text": "Galileo script already creates those",
"start": 450.08,
"duration": 5.28
},
{
"text": "environments. So you just create it once",
"start": 452.24,
"duration": 2.16
},
{
"text": "by running Galo once and once you",
"start": 455.2,
"duration": 2.96
},
{
"text": "have Galileo executed Galo once then you",
"start": 459.04,
"duration": 3.84
},
{
"text": "can reuse this environment in the",
"start": 462.0,
"duration": 2.96
},
{
"text": "future using this technique. Now",
"start": 464.88,
"duration": 2.88
},
{
"text": "the second technique instead is using",
"start": 469.76,
"duration": 4.88
},
{
"text": "Singularity. So, Singularity is a",
"start": 472.8,
"duration": 3.04
},
{
"text": "software that allows to create to run",
"start": 477.84,
"duration": 5.04
},
{
"text": "docker containers inside high",
"start": 483.92,
"duration": 6.08
},
{
"text": "performance computing environments. So",
"start": 487.04,
"duration": 3.12
},
{
"text": "let's So what you can do is you can",
"start": 489.76,
"duration": 2.72
},
{
"text": "build a docker container with all of",
"start": 492.72,
"duration": 2.96
},
{
"text": "your Python requirements and then you",
"start": 495.36,
"duration": 2.64
},
{
"text": "can use that docker container to run on",
"start": 499.2,
"duration": 3.84
},
{
"text": "Expanse. This way the this container is",
"start": 503.68,
"duration": 4.48
},
{
"text": "just one single large file and it's",
"start": 507.92,
"duration": 4.24
},
{
"text": "extremely efficient to be to be",
"start": 512.48,
"duration": 4.56
},
{
"text": "handled by the sup by the supercomputer",
"start": 515.6,
"duration": 3.12
},
{
"text": "and so you can you can execute",
"start": 518.8,
"duration": 3.2
},
{
"text": "Python in a very efficient way. So",
"start": 524.56,
"duration": 5.76
},
{
"text": "you see, so these are two techniques",
"start": 527.84,
"duration": 3.28
},
{
"text": "that you can use and some some",
"start": 530.24,
"duration": 2.4
},
{
"text": "supercomputers have a a home file",
"start": 534.0,
"duration": 3.76
},
{
"text": "system which is optimized for",
"start": 538.24,
"duration": 4.24
},
{
"text": "Python and then in those cases you can",
"start": 541.84,
"duration": 3.6
},
{
"text": "even just install cond environment in",
"start": 543.76,
"duration": 1.92
},
{
"text": "your home folder but please check the",
"start": 546.24,
"duration": 2.48
},
{
"text": "documentation first to see if that is",
"start": 547.84,
"duration": 1.6
},
{
"text": "the case for your system. After",
"start": 550.32,
"duration": 2.48
},
{
"text": "this quick introduction about ways",
"start": 553.68,
"duration": 3.36
},
{
"text": "of storing your Python environment,",
"start": 557.2,
"duration": 3.52
},
{
"text": "let's get into more of the material.",
"start": 560.16,
"duration": 2.96
},
{
"text": "But before that just a very quick",
"start": 563.36,
"duration": 3.2
},
{
"text": "note about AI code assistance. So",
"start": 566.56,
"duration": 3.2
},
{
"text": "nowadays the status of AI code",
"start": 571.92,
"duration": 5.36
},
{
"text": "assistants is very advanced and so even",
"start": 577.68,
"duration": 5.76
},
{
"text": "in very",
"start": 582.32,
"duration": 4.64
},
{
"text": "very specialized niche software in",
"start": 586.08,
"duration": 3.76
},
{
"text": "advanced science on any domain of",
"start": 591.44,
"duration": 5.36
},
{
"text": "science. You should you should use",
"start": 596.64,
"duration": 5.2
},
{
"text": "a code assistant. I recommend two",
"start": 601.12,
"duration": 4.48
},
{
"text": "different techniques. So you shouldn't",
"start": 604.88,
"duration": 3.76
},
{
"text": "write Python directly yourself. So it's",
"start": 606.48,
"duration": 1.6
},
{
"text": "extremely important to understand at",
"start": 609.44,
"duration": 2.96
},
{
"text": "high level how Python works, what",
"start": 611.76,
"duration": 2.32
},
{
"text": "techniques you want to use. But then",
"start": 616.8,
"duration": 5.04
},
{
"text": "once you get down to the actual",
"start": 619.28,
"duration": 2.48
},
{
"text": "implementation, you should ask a code",
"start": 622.08,
"duration": 2.8
},
{
"text": "assistant to implement something for you",
"start": 625.52,
"duration": 3.44
},
{
"text": "and then review that and iter",
"start": 627.92,
"duration": 2.4
},
{
"text": "iteratively between having the",
"start": 630.64,
"duration": 2.72
},
{
"text": "assistant code some feature for you and",
"start": 635.6,
"duration": 4.96
},
{
"text": "then you review that. You see if it's",
"start": 639.2,
"duration": 3.6
},
{
"text": "as expected. You make sure that you're",
"start": 642.32,
"duration": 3.12
},
{
"text": "understanding everything that the",
"start": 644.0,
"duration": 1.68
},
{
"text": "assistant is doing.",
"start": 646.8,
"duration": 2.8
},
{
"text": "And then you can iterate and request",
"start": 648.72,
"duration": 1.92
},
{
"text": "improvements to the assistant so that",
"start": 653.36,
"duration": 4.64
},
{
"text": "you can together with the support",
"start": 656.56,
"duration": 3.2
},
{
"text": "of an assistant you can get to",
"start": 659.76,
"duration": 3.2
},
{
"text": "develop incrementally your your code. I",
"start": 663.52,
"duration": 3.76
},
{
"text": "recommend",
"start": 667.36,
"duration": 3.84
},
{
"text": "yes.",
"start": 668.96,
"duration": 1.6
},
{
"text": "I'm sorry to interrupt but there's a",
"start": 669.44,
"duration": 0.48
},
{
"text": "question. Are you running this on your",
"start": 671.28,
"duration": 1.84
},
{
"text": "local computer or are you running this",
"start": 672.96,
"duration": 1.68
},
{
"text": "on expense? I'm running it on expense.",
"start": 674.88,
"duration": 1.92
},
{
"text": "That's what I thought. I wanted to make",
"start": 678.08,
"duration": 3.2
},
{
"text": "sure I answered it correctly. Thanks.",
"start": 679.28,
"duration": 1.2
},
{
"text": "Yes.",
"start": 681.6,
"duration": 2.32
},
{
"text": "So, so this is a JupyterLab environment",
"start": 682.72,
"duration": 1.12
},
{
"text": "which is executing on expanse thanks to",
"start": 685.76,
"duration": 3.04
},
{
"text": "Galileo which is this package",
"start": 689.52,
"duration": 3.76
},
{
"text": "developed here at SDSC that gives a",
"start": 692.32,
"duration": 2.8
},
{
"text": "JupyterLab environment just executing one",
"start": 697.68,
"duration": 5.36
},
{
"text": "command you're going to get a Jupyter",
"start": 700.16,
"duration": 2.48
},
{
"text": "environment running on a computing",
"start": 702.08,
"duration": 1.92
},
{
"text": "node.",
"start": 704.72,
"duration": 2.64
},
{
"text": "So, I recommend two different",
"start": 706.88,
"duration": 2.16
},
{
"text": "Code assistants. So, the first one",
"start": 710.48,
"duration": 3.6
},
{
"text": "is GitHub Copilot and so this runs",
"start": 713.52,
"duration": 3.04
},
{
"text": "inside Visual Studio Code. And",
"start": 717.44,
"duration": 3.92
},
{
"text": "this",
"start": 720.88,
"duration": 3.44
},
{
"text": "so these AI environments they do not",
"start": 723.6,
"duration": 2.72
},
{
"text": "run on the super supercomput itself. So",
"start": 727.36,
"duration": 3.76
},
{
"text": "you should use the super so you should",
"start": 730.4,
"duration": 3.04
},
{
"text": "develop your code locally. Once",
"start": 733.44,
"duration": 3.04
},
{
"text": "you have a",
"start": 737.2,
"duration": 3.76
},
{
"text": "more advanced version of the code then",
"start": 739.68,
"duration": 2.48
},
{
"text": "you can test your runs on the",
"start": 742.24,
"duration": 2.56
},
{
"text": "supercomputer and then you can go back",
"start": 744.4,
"duration": 2.16
},
{
"text": "and forth using generally the best",
"start": 746.48,
"duration": 2.08
},
{
"text": "way is if you use GitHub itself. You",
"start": 749.6,
"duration": 3.12
},
{
"text": "have a repository with your software on",
"start": 754.24,
"duration": 4.64
},
{
"text": "GitHub and then you can quickly develop",
"start": 756.0,
"duration": 1.76
},
{
"text": "locally push to GitHub and then get the",
"start": 760.08,
"duration": 4.08
},
{
"text": "latest version of the code on the",
"start": 763.04,
"duration": 2.96
},
{
"text": "supercomputer and and run on the",
"start": 764.8,
"duration": 1.76
},
{
"text": "super supercomput and go back and",
"start": 768.88,
"duration": 4.08
},
{
"text": "forth between the local environment and",
"start": 772.8,
"duration": 3.92
},
{
"text": "the remote environment by using GitHub.",
"start": 774.88,
"duration": 2.08
},
{
"text": "So so Visual Studio Code is a",
"start": 777.84,
"duration": 2.96
},
{
"text": "is an IDE. So is like this in",
"start": 782.88,
"duration": 5.04
},
{
"text": "is interactive development environment.",
"start": 786.56,
"duration": 3.68
},
{
"text": "So you so can execute notebooks or or",
"start": 789.36,
"duration": 2.8
},
{
"text": "can execute Python code. And what",
"start": 794.08,
"duration": 4.72
},
{
"text": "what happens when you use the code",
"start": 797.6,
"duration": 3.52
},
{
"text": "assistant is you have a chat basically",
"start": 800.24,
"duration": 2.64
},
{
"text": "like like chat GPT but embedded in your",
"start": 802.8,
"duration": 2.56
},
{
"text": "software environment. And so here",
"start": 807.36,
"duration": 4.56
},
{
"text": "you can chat with the assistant and and",
"start": 810.16,
"duration": 2.8
},
{
"text": "you can ask the assistant to make",
"start": 813.04,
"duration": 2.88
},
{
"text": "changes to your Jupyter notebooks or to",
"start": 815.12,
"duration": 2.08
},
{
"text": "your Python files. For example, here I",
"start": 818.48,
"duration": 3.36
},
{
"text": "wanted to create this diagram for for",
"start": 823.04,
"duration": 4.56
},
{
"text": "this class for this notebooks. And so I",
"start": 827.28,
"duration": 4.24
},
{
"text": "asked the assistant to add a markdown",
"start": 829.44,
"duration": 2.16
},
{
"text": "cell that shows the architecture of",
"start": 832.56,
"duration": 3.12
},
{
"text": "Dask. And here you see here I",
"start": 834.56,
"duration": 2.0
},
{
"text": "I have different LLMs that I that I",
"start": 839.84,
"duration": 5.28
},
{
"text": "can rely on. So generally I use GPT5",
"start": 844.96,
"duration": 5.12
},
{
"text": "and so I ask GPT5 to implement",
"start": 850.4,
"duration": 5.44
},
{
"text": "this and GP5 is going to contact OpenAI",
"start": 854.8,
"duration": 4.4
},
{
"text": "and then it's going to compute to",
"start": 859.76,
"duration": 4.96
},
{
"text": "work on this request and after a few",
"start": 864.24,
"duration": 4.48
},
{
"text": "minutes it's going to turn back come",
"start": 866.56,
"duration": 2.32
},
{
"text": "back with a modification to your",
"start": 868.56,
"duration": 2.0
},
{
"text": "Jupyter notebooks and then you can test",
"start": 872.0,
"duration": 3.44
},
{
"text": "it will see that works as you are",
"start": 874.24,
"duration": 2.24
},
{
"text": "expecting and then you can just push",
"start": 876.08,
"duration": 1.84
},
{
"text": "it inside your repository. And",
"start": 878.8,
"duration": 2.72
},
{
"text": "so so I think this is the best",
"start": 882.56,
"duration": 3.76
},
{
"text": "alternative and it's free for",
"start": 885.36,
"duration": 2.8
},
{
"text": "academic users. So you don't need to pay",
"start": 887.84,
"duration": 2.48
},
{
"text": "anything to have this support from",
"start": 889.84,
"duration": 2.0
},
{
"text": "compiler. Then the second so this is",
"start": 891.76,
"duration": 1.92
},
{
"text": "a family of tools is the code assistants",
"start": 895.52,
"duration": 3.76
},
{
"text": "that work in the terminal. So let",
"start": 899.52,
"duration": 4.0
},
{
"text": "me So these are similar but instead",
"start": 902.32,
"duration": 2.8
},
{
"text": "of working in your IDE they work on the",
"start": 907.6,
"duration": 5.28
},
{
"text": "terminal. So you see here for examples",
"start": 911.12,
"duration": 3.52
},
{
"text": "again I'm running the EI system locally",
"start": 913.44,
"duration": 2.32
},
{
"text": "on my laptop. And then I will I",
"start": 916.8,
"duration": 3.36
},
{
"text": "can push and pull whenever I need to",
"start": 921.2,
"duration": 4.4
},
{
"text": "have this available on the supercomput.",
"start": 924.4,
"duration": 3.2
},
{
"text": "So here so here you basically",
"start": 926.8,
"duration": 2.4
},
{
"text": "in your terminal you run geminy gemini",
"start": 929.84,
"duration": 3.04
},
{
"text": "and this is going to spawn this custom",
"start": 932.88,
"duration": 3.04
},
{
"text": "terminal which allows you to chat",
"start": 935.92,
"duration": 3.04
},
{
"text": "with the assistant and have the",
"start": 940.0,
"duration": 4.08
},
{
"text": "assistant make changes for you. For",
"start": 941.68,
"duration": 1.68
},
{
"text": "example, here I I simply wanted to add",
"start": 944.0,
"duration": 2.32
},
{
"text": "in the readmi at the root of my",
"start": 947.36,
"duration": 3.36
},
{
"text": "repository a list of all the subfolders",
"start": 949.6,
"duration": 2.24
},
{
"text": "and a summary basically of what each",
"start": 953.76,
"duration": 4.16
},
{
"text": "subfolder contains. So you see",
"start": 957.28,
"duration": 3.52
},
{
"text": "I ask the assistant what to do and then",
"start": 961.12,
"duration": 3.84
},
{
"text": "the assistant is going to read to",
"start": 963.44,
"duration": 2.32
},
{
"text": "look into the folders understand what",
"start": 966.72,
"duration": 3.28
},
{
"text": "each folder is doing and then it's going",
"start": 969.2,
"duration": 2.48
},
{
"text": "to say oh let me let me edit the",
"start": 971.52,
"duration": 2.32
},
{
"text": "readme this is exactly what I asked and",
"start": 974.72,
"duration": 3.2
},
{
"text": "then it's going to tell me this is",
"start": 977.52,
"duration": 2.8
},
{
"text": "my suggestion so I want to add these",
"start": 979.44,
"duration": 1.92
},
{
"text": "lines that explains what different",
"start": 982.32,
"duration": 2.88
},
{
"text": "folders are doing so I can review this",
"start": 985.44,
"duration": 3.12
},
{
"text": "modification application and then I can",
"start": 988.72,
"duration": 3.28
},
{
"text": "just tell the assistant just that's",
"start": 990.64,
"duration": 1.92
},
{
"text": "good so commit and push and then the",
"start": 993.68,
"duration": 3.04
},
{
"text": "assistant is going to automatically",
"start": 995.92,
"duration": 2.24
},
{
"text": "add the changes to the repository and",
"start": 999.44,
"duration": 3.52
},
{
"text": "then push to GitHub.",
"start": 1002.72,
"duration": 3.28
},
{
"text": "",
"start": 1006.72,
"duration": 4.0
},
{
"text": "very good. So now let's start with",
"start": 1008.48,
"duration": 1.76
},
{
"text": "the actual content of the Python for HPC",
"start": 1013.6,
"duration": 5.12
},
{
"text": "series. But",
"start": 1017.12,
"duration": 3.52
},
{
"text": "I wanted to introduce those things",
"start": 1020.16,
"duration": 3.04
},
{
"text": "to you first. So now I am going",
"start": 1022.96,
"duration": 2.8
},
{
"text": "through the notebook which is in the",
"start": 1026.56,
"duration": 3.6
},
{
"text": "folder three threads vssps processes. So",
"start": 1028.32,
"duration": 1.76
},
{
"text": "if you can you can follow along opening",
"start": 1032.4,
"duration": 4.08
},
{
"text": "it on GitHub or if you even want to run",
"start": 1035.04,
"duration": 2.64
},
{
"text": "it locally you can fire up your",
"start": 1038.08,
"duration": 3.04
},
{
"text": "Jupyter lab or VS code on your laptop",
"start": 1041.44,
"duration": 3.36
},
{
"text": "and you can execute this so",
"start": 1044.64,
"duration": 3.2
},
{
"text": "we will be getting into the HPC only",
"start": 1049.44,
"duration": 4.8
},
{
"text": "content later and for that one you need",
"start": 1053.6,
"duration": 4.16
},
{
"text": "a distributed computing environment but",
"start": 1056.0,
"duration": 2.4
},
{
"text": "for now you can just run on your",
"start": 1058.08,
"duration": 2.08
},
{
"text": "laptop. So,",
"start": 1060.08,
"duration": 2.0
},
{
"text": "the single most important aspect",
"start": 1063.2,
"duration": 3.12
},
{
"text": "about understanding how to run a",
"start": 1066.64,
"duration": 3.44
},
{
"text": "distributed computing is the difference",
"start": 1070.48,
"duration": 3.84
},
{
"text": "between threads and processes. So",
"start": 1073.04,
"duration": 2.56
},
{
"text": "when you run a Python execution",
"start": 1078.16,
"duration": 5.12
},
{
"text": "by default you only are so by",
"start": 1084.16,
"duration": 6.0
},
{
"text": "default if you're just running Python",
"start": 1089.6,
"duration": 5.44
},
{
"text": "you are exactly running like process hey",
"start": 1091.68,
"duration": 2.08
},
{
"text": "down here so you have one Python",
"start": 1095.52,
"duration": 3.84
},
{
"text": "process and one thread and whenever",
"start": 1098.72,
"duration": 3.2
},
{
"text": "you have one thread that means means",
"start": 1102.96,
"duration": 4.24
},
{
"text": "that only one core on your machine can",
"start": 1106.96,
"duration": 4.0
},
{
"text": "execute calculations. And this",
"start": 1110.48,
"duration": 3.52
},
{
"text": "is very inefficient on a on expanse for",
"start": 1113.92,
"duration": 3.44
},
{
"text": "example I have 128 cores. So if I just",
"start": 1117.92,
"duration": 4.0
},
{
"text": "run one process with one thread then",
"start": 1123.12,
"duration": 5.2
},
{
"text": "one process is doing all of the job",
"start": 1127.28,
"duration": 4.16
},
{
"text": "and the other 127 are just idle.",
"start": 1130.8,
"duration": 3.52
},
{
"text": "So the purpose of multi-threading is",
"start": 1135.6,
"duration": 4.8
},
{
"text": "to be able to leverage all of the cores",
"start": 1140.96,
"duration": 5.36
},
{
"text": "that are available on your machine.",
"start": 1144.24,
"duration": 3.28
},
{
"text": "If you have a laptop maybe it's",
"start": 1146.48,
"duration": 2.24
},
{
"text": "just four cores or six cores but still",
"start": 1148.24,
"duration": 1.76
},
{
"text": "it could give a good speed up to your",
"start": 1152.16,
"duration": 3.92
},
{
"text": "computations. So the best way of",
"start": 1155.12,
"duration": 2.96
},
{
"text": "achieving this is using multi-threading.",
"start": 1158.96,
"duration": 3.84
},
{
"text": "So we are in the case over here. So if",
"start": 1162.48,
"duration": 3.52
},
{
"text": "you configure your software properly,",
"start": 1166.24,
"duration": 3.76
},
{
"text": "you can split your computation in",
"start": 1170.56,
"duration": 4.32
},
{
"text": "multiple threads. So here for",
"start": 1174.08,
"duration": 3.52
},
{
"text": "example, we have four threads.",
"start": 1176.64,
"duration": 2.56
},
{
"text": "Now that you have four threads, now",
"start": 1179.52,
"duration": 2.88
},
{
"text": "each thread can run in parallel on its",
"start": 1183.84,
"duration": 4.32
},
{
"text": "own core of your CPU. So now",
"start": 1189.2,
"duration": 5.36
},
{
"text": "potentially you could go four times as",
"start": 1195.6,
"duration": 6.4
},
{
"text": "fast. In the best case scenario",
"start": 1198.32,
"duration": 2.72
},
{
"text": "and so if you have if you are on HPC",
"start": 1202.32,
"duration": 4.0
},
{
"text": "you have 128 cores or more and so you",
"start": 1205.6,
"duration": 3.28
},
{
"text": "can have you can run a lot faster.",
"start": 1209.6,
"duration": 4.0
},
{
"text": "But there is a problem with",
"start": 1213.68,
"duration": 4.08
},
{
"text": "this technique with multi-threading",
"start": 1218.08,
"duration": 4.4
},
{
"text": "which is that Python",
"start": 1221.12,
"duration": 3.04
},
{
"text": "in particular has a an implementation",
"start": 1225.12,
"duration": 4.0
},
{
"text": "detail in the language itself",
"start": 1229.52,
"duration": 4.4
},
{
"text": "which is called the global interpreter",
"start": 1233.44,
"duration": 3.92
},
{
"text": "lock. The global interpreter lock",
"start": 1236.32,
"duration": 2.88
},
{
"text": "is a",
"start": 1240.0,
"duration": 3.68
},
{
"text": "is a a feature of Python that",
"start": 1242.24,
"duration": 2.24
},
{
"text": "basically is a unique lock that each",
"start": 1247.44,
"duration": 5.2
},
{
"text": "thread",
"start": 1252.0,
"duration": 4.56
},
{
"text": "has to acquire whenever it wants to make",
"start": 1253.52,
"duration": 1.52
},
{
"text": "to make any change to Python",
"start": 1259.04,
"duration": 5.52
},
{
"text": "objects. So that means that",
"start": 1263.52,
"duration": 4.48
},
{
"text": "even if you have four threads",
"start": 1268.96,
"duration": 5.44
},
{
"text": "thread one is when so imagine thread",
"start": 1274.16,
"duration": 5.2
},
{
"text": "one is executing a Python script line",
"start": 1278.88,
"duration": 4.72
},
{
"text": "per line. And now has a line where",
"start": 1283.6,
"duration": 4.72
},
{
"text": "is doing some computation is modifying",
"start": 1287.84,
"duration": 4.24
},
{
"text": "some P python classes or objects",
"start": 1291.04,
"duration": 3.2
},
{
"text": "and at that point the thread acquires",
"start": 1294.72,
"duration": 3.68
},
{
"text": "the lock. So once this thread has",
"start": 1298.88,
"duration": 4.16
},
{
"text": "acquired the lock the other threads they",
"start": 1301.52,
"duration": 2.64
},
{
"text": "also need the lock to to keep executing",
"start": 1305.52,
"duration": 4.0
},
{
"text": "their code and so but the thread",
"start": 1309.28,
"duration": 3.76
},
{
"text": "number one has acquired the lock. So the",
"start": 1312.72,
"duration": 3.44
},
{
"text": "other three cannot",
"start": 1314.96,
"duration": 2.24
},
{
"text": "execute. So they have they are",
"start": 1318.08,
"duration": 3.12
},
{
"text": "waiting for thread one to release the",
"start": 1320.32,
"duration": 2.24
},
{
"text": "lock. And so this so you get",
"start": 1322.72,
"duration": 2.4
},
{
"text": "into a race condition when all the",
"start": 1327.44,
"duration": 4.72
},
{
"text": "threads want the lock and and so one of",
"start": 1329.44,
"duration": 2.0
},
{
"text": "them is executing at a time. And",
"start": 1332.24,
"duration": 2.8
},
{
"text": "so you don't get any speed up.",
"start": 1334.96,
"duration": 2.72
},
{
"text": "Even if you have four threads, there is",
"start": 1337.68,
"duration": 2.72
},
{
"text": "this other problem which is the global",
"start": 1339.68,
"duration": 2.0
},
{
"text": "interpreter lock which is just allow one",
"start": 1341.52,
"duration": 1.84
},
{
"text": "of those threads to execute. And",
"start": 1344.88,
"duration": 3.36
},
{
"text": "there is a so how do we get around",
"start": 1350.08,
"duration": 5.2
},
{
"text": "this problem? So this is a big",
"start": 1354.16,
"duration": 4.08
},
{
"text": "problem in High Performance",
"start": 1356.64,
"duration": 2.48
},
{
"text": "Computing with Python. How do we get",
"start": 1360.72,
"duration": 4.08
},
{
"text": "around this? So there is a workar around",
"start": 1362.72,
"duration": 2.0
},
{
"text": "because the",
"start": 1366.32,
"duration": 3.6
},
{
"text": "Python the global interpreter lock is",
"start": 1369.04,
"duration": 2.72
},
{
"text": "unique to each python process. So",
"start": 1372.4,
"duration": 3.36
},
{
"text": "a workar around is oh why don't we just",
"start": 1376.48,
"duration": 4.08
},
{
"text": "spawn four processes.",
"start": 1379.84,
"duration": 3.36
},
{
"text": "And so we have those four Python",
"start": 1382.48,
"duration": 2.64
},
{
"text": "processes and each of them has a",
"start": 1385.04,
"duration": 2.56
},
{
"text": "different lock and so they do there's no",
"start": 1387.36,
"duration": 2.32
},
{
"text": "race condition in this situation and",
"start": 1389.52,
"duration": 2.16
},
{
"text": "they can process in parallel easily and",
"start": 1391.6,
"duration": 2.08
},
{
"text": "they can execute without any",
"start": 1394.96,
"duration": 3.36
},
{
"text": "problems. But there is a big",
"start": 1400.64,
"duration": 5.68
},
{
"text": "problem there is a big issue with this",
"start": 1405.76,
"duration": 5.12
},
{
"text": "is that the memory is separate for",
"start": 1409.44,
"duration": 3.68
},
{
"text": "each process. So while if you if you can",
"start": 1414.48,
"duration": 5.04
},
{
"text": "use multi- threading properly then it's",
"start": 1417.68,
"duration": 3.2
},
{
"text": "the good is that the memory space for",
"start": 1422.0,
"duration": 4.32
},
{
"text": "the process is unique. So let's say that",
"start": 1426.88,
"duration": 4.88
},
{
"text": "you have a big array memory and in case",
"start": 1429.04,
"duration": 2.16
},
{
"text": "number one where you have a single",
"start": 1432.0,
"duration": 2.96
},
{
"text": "process and four threads the threads can",
"start": 1433.6,
"duration": 1.6
},
{
"text": "access that memory. So they can read a",
"start": 1436.88,
"duration": 3.28
},
{
"text": "piece of an array they can write it back",
"start": 1439.2,
"duration": 2.32
},
{
"text": "no problem. Why? In the second case,",
"start": 1442.0,
"duration": 2.8
},
{
"text": "",
"start": 1446.48,
"duration": 4.48
},
{
"text": "now you have to copy that array four",
"start": 1448.32,
"duration": 1.84
},
{
"text": "times in each of the process and so you",
"start": 1452.88,
"duration": 4.56
},
{
"text": "are wasting some resources because",
"start": 1456.24,
"duration": 3.36
},
{
"text": "there's duplication. So you're using",
"start": 1458.16,
"duration": 1.92
},
{
"text": "more memory, you are taking more time.",
"start": 1460.4,
"duration": 2.24
},
{
"text": "So every time you use Python",
"start": 1464.32,
"duration": 3.92
},
{
"text": "and you want to use multiple threads,",
"start": 1468.32,
"duration": 4.0
},
{
"text": "you have to keep this options in",
"start": 1469.92,
"duration": 1.6
},
{
"text": "mind. So do I can I use can I am I able",
"start": 1472.56,
"duration": 2.64
},
{
"text": "to use threads or am I forced to use",
"start": 1476.08,
"duration": 3.52
},
{
"text": "processes? So now let's see a",
"start": 1478.72,
"duration": 2.64
},
{
"text": "an actual use case of this.",
"start": 1483.76,
"duration": 5.04
},
{
"text": "So so I as I told you I'm running",
"start": 1488.0,
"duration": 4.24
},
{
"text": "on Expanse 128 cores. Now",
"start": 1491.36,
"duration": 3.36
},
{
"text": "now we are going to go into details",
"start": 1496.4,
"duration": 5.04
},
{
"text": "about Dask later on but for now just",
"start": 1499.44,
"duration": 3.04
},
{
"text": "trust me what we are doing here is we",
"start": 1503.44,
"duration": 4.0
},
{
"text": "have a computationally heavy function",
"start": 1507.52,
"duration": 4.08
},
{
"text": "which is computing the Fibonacci",
"start": 1511.04,
"duration": 3.52
},
{
"text": "sequence for some number n and",
"start": 1513.92,
"duration": 2.88
},
{
"text": "we want to execute this many many",
"start": 1520.96,
"duration": 7.04
},
{
"text": "times. So we want to execute this 128",
"start": 1525.84,
"duration": 4.88
},
{
"text": "times because just to show that we're",
"start": 1529.84,
"duration": 4.0
},
{
"text": "doing some parallel computation.",
"start": 1532.8,
"duration": 2.96
},
{
"text": "So and if we don't want to it's",
"start": 1536.64,
"duration": 3.84
},
{
"text": "going to take a long time to execute",
"start": 1542.72,
"duration": 6.08
},
{
"text": "them one after the other. So what we",
"start": 1544.64,
"duration": 1.92
},
{
"text": "want to do is to execute them in",
"start": 1546.72,
"duration": 2.08
},
{
"text": "parallel. So this is the first way of",
"start": 1548.8,
"duration": 2.08
},
{
"text": "executing this in parallel on multiple",
"start": 1552.32,
"duration": 3.52
},
{
"text": "threads. So the idea here is we",
"start": 1555.2,
"duration": 2.88
},
{
"text": "now have 128 threads that are each of",
"start": 1560.08,
"duration": 4.88
},
{
"text": "them is executing this in parallel.",
"start": 1564.32,
"duration": 4.24
},
{
"text": "",
"start": 1566.56,
"duration": 2.24
},
{
"text": "the problem is that we have some",
"start": 1568.32,
"duration": 1.76
},
{
"text": "issues with the global interpreter",
"start": 1572.4,
"duration": 4.08
},
{
"text": "block. And in fact you see that and so",
"start": 1574.16,
"duration": 1.76
},
{
"text": "as I told you we can run either",
"start": 1578.56,
"duration": 4.4
},
{
"text": "this way with multi- threading or two",
"start": 1582.8,
"duration": 4.24
},
{
"text": "with multiprocessing.",
"start": 1586.32,
"duration": 3.52
},
{
"text": "So in this case multi-processing",
"start": 1588.96,
"duration": 2.64
},
{
"text": "is beneficial. So you see we are",
"start": 1592.96,
"duration": 4.0
},
{
"text": "actually paying the price of the",
"start": 1596.24,
"duration": 3.28
},
{
"text": "global interpreter block. So",
"start": 1599.12,
"duration": 2.88
},
{
"text": "executing the threaded version is a lot",
"start": 1603.04,
"duration": 3.92
},
{
"text": "slower than the process the",
"start": 1606.0,
"duration": 2.96
},
{
"text": "multiprocessing version. But you",
"start": 1607.92,
"duration": 1.92
},
{
"text": "see that with Dask is very easy to test",
"start": 1610.48,
"duration": 2.56
},
{
"text": "both. You just say tell Dask I want to",
"start": 1614.72,
"duration": 4.24
},
{
"text": "execute this with a threaded",
"start": 1619.12,
"duration": 4.4
},
{
"text": "scheduler or I want to execute this",
"start": 1620.8,
"duration": 1.68
},
{
"text": "with a process scheduler and Dask is",
"start": 1623.12,
"duration": 2.32
},
{
"text": "taking care of spawning multiple threads",
"start": 1626.16,
"duration": 3.04
},
{
"text": "or spawning multiple processes and do",
"start": 1629.36,
"duration": 3.2
},
{
"text": "the work for you and return the result",
"start": 1632.08,
"duration": 2.72
},
{
"text": "if there is yeah and return your",
"start": 1636.0,
"duration": 3.92
},
{
"text": "result.",
"start": 1639.04,
"duration": 3.04
},
{
"text": "Now let but there are other cases",
"start": 1644.8,
"duration": 5.76
},
{
"text": "Andrea sorry to interrupt you but",
"start": 1648.72,
"duration": 3.92
},
{
"text": "there's an interesting comment from",
"start": 1650.4,
"duration": 1.68
},
{
"text": "one of the participants about a new",
"start": 1652.88,
"duration": 2.48
},
{
"text": "Python release and the global",
"start": 1655.44,
"duration": 2.56
},
{
"text": "interpreter lock if you want to take a",
"start": 1657.44,
"duration": 2.0
},
{
"text": "look at that. It's a good question to",
"start": 1659.92,
"duration": 2.48
},
{
"text": "answer right now.",
"start": 1661.84,
"duration": 1.92
},
{
"text": "Next month with the release of 3. 14",
"start": 1667.92,
"duration": 6.08
},
{
"text": "right below",
"start": 1670.96,
"duration": 3.04
},
{
"text": "see it. Yeah.",
"start": 1672.96,
"duration": 2.0
},
{
"text": "Yeah. So fortunately with scientific",
"start": 1674.48,
"duration": 1.52
},
{
"text": "computing a lot of the library are",
"start": 1676.8,
"duration": 2.32
},
{
"text": "already in the global interpreter block.",
"start": 1680.8,
"duration": 4.0
},
{
"text": "So it's not already now whenever",
"start": 1683.44,
"duration": 2.64
},
{
"text": "you're executing multiode or pandas code",
"start": 1689.28,
"duration": 5.84
},
{
"text": "or Numba as we will see later that's",
"start": 1693.76,
"duration": 4.48
},
{
"text": "already releasing the global",
"start": 1696.24,
"duration": 2.48
},
{
"text": "interpreter log. So a Python release",
"start": 1699.6,
"duration": 3.36
},
{
"text": "without global interpreter lock is going",
"start": 1703.28,
"duration": 3.68
},
{
"text": "to be beneficial, very beneficial, but",
"start": 1706.0,
"duration": 2.72
},
{
"text": "mostly for other applications like",
"start": 1708.8,
"duration": 2.8
},
{
"text": "for example a Django application or so",
"start": 1713.44,
"duration": 4.64
},
{
"text": "application that rely a lot on pure",
"start": 1718.16,
"duration": 4.72
},
{
"text": "Python code. With scientific",
"start": 1721.52,
"duration": 3.36
},
{
"text": "applications we are running so often",
"start": 1724.0,
"duration": 2.48
},
{
"text": "with scientific library which already",
"start": 1726.72,
"duration": 2.72
},
{
"text": "released the global interpreter lock",
"start": 1729.04,
"duration": 2.32
},
{
"text": "that's probably not going to be a big",
"start": 1730.88,
"duration": 1.84
},
{
"text": "improvements for scientific computing",
"start": 1734.4,
"duration": 3.52
},
{
"text": "but in general for the global python",
"start": 1737.6,
"duration": 3.2
},
{
"text": "ecosystem it's really great",
"start": 1740.0,
"duration": 2.4
},
{
"text": "so let me show you now another",
"start": 1743.6,
"duration": 3.6
},
{
"text": "example so this while the other function",
"start": 1747.52,
"duration": 3.92
},
{
"text": "was very computation ally heavy. This",
"start": 1751.28,
"duration": 3.76
},
{
"text": "one instead is waiting on something. So",
"start": 1754.4,
"duration": 3.12
},
{
"text": "in this case it's just sleeping but this",
"start": 1758.16,
"duration": 3.76
},
{
"text": "could wait on a network connection or",
"start": 1760.56,
"duration": 2.4
},
{
"text": "wait for this access.",
"start": 1763.6,
"duration": 3.04
},
{
"text": "So whenever you are so whenever",
"start": 1766.24,
"duration": 2.64
},
{
"text": "it's not about the raw CPU power but",
"start": 1772.4,
"duration": 6.16
},
{
"text": "there are",
"start": 1774.56,
"duration": 2.16
},
{
"text": "whenever you don't have a lot of",
"start": 1777.84,
"duration": 3.28
},
{
"text": "need for accessing the global",
"start": 1784.4,
"duration": 6.56
},
{
"text": "interpreter lock because you're just",
"start": 1786.08,
"duration": 1.68
},
{
"text": "waiting on network or waiting on disk",
"start": 1788.24,
"duration": 2.16
},
{
"text": "then threading is always beneficial.",
"start": 1790.8,
"duration": 2.56
},
{
"text": "So in general you always want to",
"start": 1793.36,
"duration": 2.56
},
{
"text": "try to use the threading",
"start": 1797.04,
"duration": 3.68
},
{
"text": "scheduler. Unless",
"start": 1799.52,
"duration": 2.48
},
{
"text": "you find that there are some",
"start": 1803.04,
"duration": 3.52
},
{
"text": "performance issues and then you might",
"start": 1804.32,
"duration": 1.28
},
{
"text": "so if you cannot solve them then you",
"start": 1809.36,
"duration": 5.04
},
{
"text": "might try with the processes. But",
"start": 1811.6,
"duration": 2.24
},
{
"text": "always try first with the threading.",
"start": 1814.8,
"duration": 3.2
},
{
"text": "And one trick of that is always",
"start": 1816.64,
"duration": 1.84
},
{
"text": "try to use libraries that release the",
"start": 1819.36,
"duration": 2.72
},
{
"text": "global interpreter block like NumPy and",
"start": 1822.08,
"duration": 2.72
},
{
"text": "pandas.",
"start": 1825.2,
"duration": 3.12
},
{
"text": "",
"start": 1828.88,
"duration": 3.68
},
{
"text": "now let me stop after this and",
"start": 1830.96,
"duration": 2.08
},
{
"text": "ask if there are any questions. I'm",
"start": 1836.32,
"duration": 5.36
},
{
"text": "going to",
"start": 1838.32,
"duration": 2.0
},
{
"text": "so you can unmute yourself and ask",
"start": 1841.04,
"duration": 2.72
},
{
"text": "a question or write to the chat. So we",
"start": 1844.88,
"duration": 3.84
},
{
"text": "want to focus on questions about",
"start": 1848.8,
"duration": 3.92
},
{
"text": "about tasks and threads.",
"start": 1853.2,
"duration": 4.4
},
{
"text": "There there was a question a while ago",
"start": 1857.76,
"duration": 4.56
},
{
"text": "about a development environment and",
"start": 1860.32,
"duration": 2.56
},
{
"text": "can we install it on expanse",
"start": 1864.16,
"duration": 3.84
},
{
"text": "and I'm trying to find the can a user",
"start": 1867.92,
"duration": 3.76
},
{
"text": "install the GitHub copilot extension to",
"start": 1870.64,
"duration": 2.72
},
{
"text": "the VS code that is installed on the",
"start": 1873.52,
"duration": 2.88
},
{
"text": "cluster. No,",
"start": 1875.68,
"duration": 2.16
},
{
"text": "no,",
"start": 1877.92,
"duration": 2.24
},
{
"text": "no. Unfortunately, VS Code uses too",
"start": 1878.32,
"duration": 0.4
},
{
"text": "much memory and so so it's not so",
"start": 1881.92,
"duration": 3.6
},
{
"text": "even if you use the SSH extension",
"start": 1886.32,
"duration": 4.4
},
{
"text": "and you run the interface of VS code",
"start": 1890.48,
"duration": 4.16
},
{
"text": "locally and on Expanse you or another",
"start": 1893.12,
"duration": 2.64
},
{
"text": "supercomput you are running just the VS",
"start": 1897.6,
"duration": 4.48
},
{
"text": "Code server. It's this is not allowed on",
"start": 1900.48,
"duration": 2.88
},
{
"text": "a computing node on expense. However,",
"start": 1902.88,
"duration": 2.4
},
{
"text": "other supercomputers have different",
"start": 1908.72,
"duration": 5.84
},
{
"text": "policies. So there are some that",
"start": 1910.4,
"duration": 1.68
},
{
"text": "allows you to run VS code server on a",
"start": 1914.24,
"duration": 3.84
},
{
"text": "compute node. So that's another option",
"start": 1917.2,
"duration": 2.96
},
{
"text": "you can look at and that's that way",
"start": 1919.84,
"duration": 2.64
},
{
"text": "you have the convenience of a local",
"start": 1922.8,
"duration": 2.96
},
{
"text": "GitHub a local VS code instance and",
"start": 1925.52,
"duration": 2.72
},
{
"text": "which is actually executing code on",
"start": 1930.48,
"duration": 4.96
},
{
"text": "the supercomputer computing node itself.",
"start": 1933.68,
"duration": 3.2
},
{
"text": "Yeah, I think you would want to look on",
"start": 1936.96,
"duration": 3.28
},
{
"text": "the access website ACCESS",
"start": 1938.64,
"duration": 1.68
},
{
"text": "which is a system integrate. It's",
"start": 1942.24,
"duration": 3.6
},
{
"text": "information about all the HPC resources",
"start": 1944.96,
"duration": 2.72
},
{
"text": "and you could find perhaps a machine",
"start": 1948.24,
"duration": 3.28
},
{
"text": "that hosts that software environment.",
"start": 1950.24,
"duration": 2.0
},
{
"text": "But that's all the questions in the",
"start": 1953.12,
"duration": 2.88
},
{
"text": "chat. Does anybody have anything they",
"start": 1954.64,
"duration": 1.52
},
{
"text": "so let let's switch to folder",
"start": 1964.64,
"duration": 10.0
},
{
"text": "number four.",
"start": 1968.16,
"duration": 3.52
},
{
"text": "So four number",
"start": 1970.88,
"duration": 2.72
},
{
"text": "let's start with the basics.",
"start": 1975.76,
"duration": 4.88
},
{
"text": "So so today I'm going to introduce",
"start": 1978.16,
"duration": 2.4
},
{
"text": "you two Python packages that can help",
"start": 1981.04,
"duration": 2.88
},
{
"text": "with scientific computing, high",
"start": 1983.28,
"duration": 2.24
},
{
"text": "performance computing. And",
"start": 1984.8,
"duration": 1.52
},
{
"text": "the first one is Numba. So Numba is a",
"start": 1988.16,
"duration": 3.36
},
{
"text": "just in time compiler. So you know that",
"start": 1992.24,
"duration": 4.08
},
{
"text": "Python is extremely convenient and easy",
"start": 1994.96,
"duration": 2.72
},
{
"text": "to use. But you pay for that with",
"start": 1998.8,
"duration": 3.84
},
{
"text": "performance. So compared to compiled",
"start": 2003.76,
"duration": 4.96
},
{
"text": "languages like C, C++ and for Python is",
"start": 2006.56,
"duration": 2.8
},
{
"text": "very slow. So what N and",
"start": 2010.0,
"duration": 3.44
},
{
"text": "what Numba is doing is actually",
"start": 2016.24,
"duration": 6.24
},
{
"text": "computing is actually compiling",
"start": 2020.32,
"duration": 4.08
},
{
"text": "u your Python functions. So you so",
"start": 2024.08,
"duration": 3.76
},
{
"text": "you write a Python function and then",
"start": 2028.96,
"duration": 4.88
},
{
"text": "basically you tell Numba I want this",
"start": 2030.96,
"duration": 2.0
},
{
"text": "function to be compile. And so Numba",
"start": 2033.68,
"duration": 2.72
},
{
"text": "under the hood uses",
"start": 2038.24,
"duration": 4.56
},
{
"text": "some tooling which is the same",
"start": 2042.96,
"duration": 4.72
},
{
"text": "tooling uses used by C++ compiler. So",
"start": 2046.32,
"duration": 3.36
},
{
"text": "takes your Python, transforms it into an",
"start": 2051.36,
"duration": 5.04
},
{
"text": "intermediate representation and then",
"start": 2054.48,
"duration": 3.12
},
{
"text": "that becomes machine code. And so",
"start": 2056.4,
"duration": 1.92
},
{
"text": "basically you have a Python function",
"start": 2059.2,
"duration": 2.8
},
{
"text": "that runs at a speed comparable to C",
"start": 2061.84,
"duration": 2.64
},
{
"text": "and for run without the pain. So",
"start": 2066.96,
"duration": 5.12
},
{
"text": "now let's see how how you do that.",
"start": 2071.28,
"duration": 4.32
},
{
"text": "So",
"start": 2077.92,
"duration": 6.64
},
{
"text": "if you have never used a decorator",
"start": 2079.76,
"duration": 1.84
},
{
"text": "a decorator is a way of implicitly",
"start": 2082.72,
"duration": 2.96
},
{
"text": "calling a function",
"start": 2087.6,
"duration": 4.88
},
{
"text": "on a function that you define. So",
"start": 2090.24,
"duration": 2.64
},
{
"text": "when we do this so go fast is a standard",
"start": 2092.64,
"duration": 2.4
},
{
"text": "Python function. So nothing",
"start": 2096.8,
"duration": 4.16
},
{
"text": "special about that. But then if",
"start": 2099.92,
"duration": 3.12
},
{
"text": "you just above the defaf if you write at",
"start": 2104.64,
"duration": 4.72
},
{
"text": "and then this decorator what this is",
"start": 2108.56,
"duration": 3.92
},
{
"text": "going to do is taking the go fast",
"start": 2112.16,
"duration": 3.6
},
{
"text": "function passing it to the jit function.",
"start": 2115.28,
"duration": 3.12
},
{
"text": "So J self is also a function your",
"start": 2121.04,
"duration": 5.76
},
{
"text": "decorator and then this is going to",
"start": 2124.08,
"duration": 3.04
},
{
"text": "return",
"start": 2127.52,
"duration": 3.44
},
{
"text": "another function. So we have",
"start": 2129.2,
"duration": 1.68
},
{
"text": "three functions. The original function",
"start": 2131.76,
"duration": 2.56
},
{
"text": "go fast which is a pure Python function.",
"start": 2133.68,
"duration": 1.92
},
{
"text": "Then g is a function itself which takes",
"start": 2136.64,
"duration": 2.96
},
{
"text": "go fast and produces a a special object",
"start": 2140.8,
"duration": 4.16
},
{
"text": "that is going to provide a very fast",
"start": 2145.84,
"duration": 5.04
},
{
"text": "version of your function the first time",
"start": 2149.92,
"duration": 4.08
},
{
"text": "you call it. And so in practice so",
"start": 2153.2,
"duration": 3.28
},
{
"text": "you have your own array. You have",
"start": 2156.8,
"duration": 3.6
},
{
"text": "your NumPy array and then",
"start": 2159.92,
"duration": 3.12
},
{
"text": "you you call",
"start": 2163.52,
"duration": 3.6
},
{
"text": "you call go fast and this is going to",
"start": 2167.52,
"duration": 4.0
},
{
"text": "compile it and you know that whenever",
"start": 2172.16,
"duration": 4.64
},
{
"text": "you're compiling a an issue with",
"start": 2174.72,
"duration": 2.56
},
{
"text": "compiling is that you need to know the",
"start": 2177.76,
"duration": 3.04
},
{
"text": "types of everything. Compiling ma",
"start": 2179.84,
"duration": 2.08
},
{
"text": "makes a specialized function which",
"start": 2183.6,
"duration": 3.76
},
{
"text": "specifically works with that kind of",
"start": 2186.96,
"duration": 3.36
},
{
"text": "input. So a float 32 or an integer or a",
"start": 2189.36,
"duration": 2.4
},
{
"text": "boolean something like that.",
"start": 2193.28,
"duration": 3.92
},
{
"text": "So that's why Numba has to wait",
"start": 2195.84,
"duration": 2.56
},
{
"text": "until you call the function with a",
"start": 2198.96,
"duration": 3.12
},
{
"text": "specific input to know how to compile",
"start": 2200.88,
"duration": 1.92
},
{
"text": "it. So compiles it on the fly which is",
"start": 2203.04,
"duration": 2.16
},
{
"text": "really fast because compiling takes a",
"start": 2206.48,
"duration": 3.44
},
{
"text": "few microc microsconds. So you don't",
"start": 2209.04,
"duration": 2.56
},
{
"text": "really notice and then",
"start": 2211.36,
"duration": 2.32
},
{
"text": "yeah and then you use it like a",
"start": 2214.64,
"duration": 3.28
},
{
"text": "a standard Python function. And",
"start": 2219.04,
"duration": 4.4
},
{
"text": "so the technique",
"start": 2222.88,
"duration": 3.84
},
{
"text": "that you use with number is you want to",
"start": 2225.2,
"duration": 2.32
},
{
"text": "optimize",
"start": 2229.12,
"duration": 3.92
},
{
"text": "just the most computationally heavy",
"start": 2230.8,
"duration": 1.68
},
{
"text": "parts of your code. So you're not",
"start": 2233.6,
"duration": 2.8
},
{
"text": "taking your code and rewriting",
"start": 2236.48,
"duration": 2.88
},
{
"text": "completely and starting using Numba from",
"start": 2238.64,
"duration": 2.16
},
{
"text": "the the interface or from the high",
"start": 2242.88,
"duration": 4.24
},
{
"text": "level functionality of your code. No.",
"start": 2247.2,
"duration": 4.32
},
{
"text": "The the strategy of using Numba is",
"start": 2250.48,
"duration": 3.28
},
{
"text": "do your keep everything as it is in your",
"start": 2254.96,
"duration": 4.48
},
{
"text": "Code. Read your f access your files your",
"start": 2258.24,
"duration": 3.28
},
{
"text": "networking and everything. But then once",
"start": 2261.12,
"duration": 2.88
},
{
"text": "you are in your computationally heavy",
"start": 2263.92,
"duration": 2.8
},
{
"text": "loop then you plug this in. So",
"start": 2267.44,
"duration": 3.52
},
{
"text": "it's very nice because you can",
"start": 2270.64,
"duration": 3.2
},
{
"text": "transition from a normal Python code to",
"start": 2273.36,
"duration": 2.72
},
{
"text": "a Numba power code incrementally. So",
"start": 2277.04,
"duration": 3.68
},
{
"text": "you can say oh let me just try to",
"start": 2279.92,
"duration": 2.88
},
{
"text": "optimize this piece of code which is the",
"start": 2282.16,
"duration": 2.24
},
{
"text": "most important.",
"start": 2284.08,
"duration": 1.92
},
{
"text": "And so the benefits of number is",
"start": 2286.56,
"duration": 2.48
},
{
"text": "not only computational so that",
"start": 2290.56,
"duration": 4.0
},
{
"text": "computation runs at C speed instead of",
"start": 2294.0,
"duration": 3.44
},
{
"text": "Python speed but also uses a lot less",
"start": 2296.48,
"duration": 2.48
},
{
"text": "memory. So if you if you're familiar",
"start": 2302.24,
"duration": 5.76
},
{
"text": "with NumPy you know that NumPy is",
"start": 2305.84,
"duration": 3.6
},
{
"text": "creating a lot of temporary arrays while",
"start": 2309.52,
"duration": 3.68
},
{
"text": "you're going through your computation.",
"start": 2312.4,
"duration": 2.88
},
{
"text": "I'm sure that",
"start": 2313.68,
"duration": 1.28
},
{
"text": "you have the experience of your",
"start": 2315.92,
"duration": 2.24
},
{
"text": "Python kernel goes out of memory and",
"start": 2318.32,
"duration": 2.4
},
{
"text": "and that's and you don't understand",
"start": 2321.92,
"duration": 3.6
},
{
"text": "where that memory is coming from. That's",
"start": 2324.8,
"duration": 2.88
},
{
"text": "probably NumPy that is creating",
"start": 2326.72,
"duration": 1.92
},
{
"text": "temporary arrays that are then lingering",
"start": 2330.72,
"duration": 4.0
},
{
"text": "in memory and are driving up your memory",
"start": 2334.0,
"duration": 3.28
},
{
"text": "consumption. So, Numba also takes",
"start": 2337.84,
"duration": 3.84
},
{
"text": "care of that for you and creates a very",
"start": 2340.64,
"duration": 2.8
},
{
"text": "efficient machine code loop that",
"start": 2343.68,
"duration": 3.04
},
{
"text": "does your data processing and gives you",
"start": 2349.2,
"duration": 5.52
},
{
"text": "the output without storing without",
"start": 2351.68,
"duration": 2.48
},
{
"text": "creating unnecessary intermediate",
"start": 2353.92,
"duration": 2.24
},
{
"text": "arrays.",
"start": 2355.84,
"duration": 1.92
},
{
"text": "So, So, let's take a look.",
"start": 2358.24,
"duration": 2.4
},
{
"text": "So you see here",
"start": 2363.04,
"duration": 4.8
},
{
"text": "this is testing the number optimized",
"start": 2366.0,
"duration": 2.96
},
{
"text": "version",
"start": 2370.96,
"duration": 4.96
},
{
"text": "and the Python version.",
"start": 2373.44,
"duration": 2.48
},
{
"text": "So you see",
"start": 2382.88,
"duration": 9.44
},
{
"text": "yeah so you see",
"start": 2388.0,
"duration": 5.12
},
{
"text": "Python is yeah Python is still pretty",
"start": 2390.64,
"duration": 2.64
},
{
"text": "fast. So you see still a factor of 20",
"start": 2394.96,
"duration": 4.32
},
{
"text": "but they are both quite fast.",
"start": 2397.84,
"duration": 2.88
},
{
"text": "So you see here with Python you",
"start": 2401.76,
"duration": 3.92
},
{
"text": "can access the underlying PI pure Python",
"start": 2404.96,
"duration": 3.2
},
{
"text": "function.",
"start": 2408.08,
"duration": 3.12
},
{
"text": "So you",
"start": 2410.24,
"duration": 2.16
},
{
"text": "did you see that Andy Andrea there's a",
"start": 2411.92,
"duration": 1.68
},
{
"text": "question about how Numba compares with",
"start": 2414.48,
"duration": 2.56
},
{
"text": "syon or Weave for efficiency.",
"start": 2417.52,
"duration": 3.04
},
{
"text": "Yes. So",
"start": 2421.12,
"duration": 3.6
},
{
"text": "so as for so I haven't used weave in",
"start": 2425.36,
"duration": 4.24
},
{
"text": "a long time so I don't",
"start": 2429.28,
"duration": 3.92
},
{
"text": "I don't have a clear picture in",
"start": 2433.12,
"duration": 3.84
},
{
"text": "mind but as for Syon the the speed is",
"start": 2435.76,
"duration": 2.64
},
{
"text": "comparable so they both basically",
"start": 2440.56,
"duration": 4.8
},
{
"text": "give you back machine code and",
"start": 2443.6,
"duration": 3.04
},
{
"text": "usually I recommend always using number",
"start": 2447.52,
"duration": 3.92
},
{
"text": "because it is more it is easier to use",
"start": 2450.88,
"duration": 3.36
},
{
"text": "so you gain in development speed",
"start": 2454.8,
"duration": 3.92
},
{
"text": "but",
"start": 2459.04,
"duration": 4.24
},
{
"text": "and I recommend Syon only if you are",
"start": 2460.96,
"duration": 1.92
},
{
"text": "interfacing with existing C library and",
"start": 2464.96,
"duration": 4.0
},
{
"text": "so you don't have unfortunately that",
"start": 2467.84,
"duration": 2.88
},
{
"text": "capability in Numba so Numba is good",
"start": 2469.84,
"duration": 2.0
},
{
"text": "when you are in a pure Python NumPy",
"start": 2472.64,
"duration": 2.8
},
{
"text": "computations if inside your",
"start": 2477.68,
"duration": 5.04
},
{
"text": "computations you are calling an",
"start": 2481.36,
"duration": 3.68
},
{
"text": "underlying C library then Numba cannot",
"start": 2484.96,
"duration": 3.6
},
{
"text": "help you so you need to use cy",
"start": 2489.12,
"duration": 4.16
},
{
"text": "so yeah so s yeah and another great",
"start": 2493.68,
"duration": 4.56
},
{
"text": "feature of Numba is automatic paralism",
"start": 2499.2,
"duration": 5.52
},
{
"text": "that we'll see later and that's also an",
"start": 2502.24,
"duration": 3.04
},
{
"text": "advantage over syon so I think if you",
"start": 2505.28,
"duration": 3.04
},
{
"text": "have the possibility of of so if you do",
"start": 2508.4,
"duration": 3.12
},
{
"text": "not need custom C code if you don't",
"start": 2511.2,
"duration": 2.8
},
{
"text": "need calling underlying C existing C",
"start": 2515.12,
"duration": 3.92
},
{
"text": "libraries then I recommend you use",
"start": 2518.24,
"duration": 3.12
},
{
"text": "number",
"start": 2520.16,
"duration": 1.92
},
{
"text": "instead of cy but yeah performance",
"start": 2521.68,
"duration": 1.52
},
{
"text": "wise they they are very very similar",
"start": 2524.48,
"duration": 2.8
},
{
"text": "and you can also call a JIT",
"start": 2528.08,
"duration": 3.6
},
{
"text": "optimized function from another JIT-optimized function",
"start": 2535.84,
"duration": 0.64
},
{
"text": "So you see here I",
"start": 2540.72,
"duration": 4.88
},
{
"text": "have two functions and I add the",
"start": 2542.64,
"duration": 1.92
},
{
"text": "decorator to both of them and now they",
"start": 2545.76,
"duration": 3.12
},
{
"text": "are both compiled to machine code and",
"start": 2548.8,
"duration": 3.04
},
{
"text": "you can call one from the other.",
"start": 2550.8,
"duration": 2.0
},
{
"text": "So this just helps you organize your",
"start": 2552.96,
"duration": 2.16
},
{
"text": "Code better. But you see once the",
"start": 2555.76,
"duration": 2.8
},
{
"text": "the function is nonoptimized I mean from",
"start": 2561.28,
"duration": 5.52
},
{
"text": "the usage perspective it's really",
"start": 2565.04,
"duration": 3.76
},
{
"text": "doesn't look different from a normal",
"start": 2567.92,
"duration": 2.88
},
{
"text": "function. So it's a it's a very",
"start": 2569.76,
"duration": 1.84
},
{
"text": "convenient way of squeezing the mass the",
"start": 2573.04,
"duration": 3.28
},
{
"text": "most amount of performance from a single",
"start": 2578.16,
"duration": 5.12
},
{
"text": "machine. Then we'll see how to how",
"start": 2580.56,
"duration": 2.4
},
{
"text": "you want to parallelize on multiple",
"start": 2583.84,
"duration": 3.28
},
{
"text": "machine. But the first step is you don't",
"start": 2585.36,
"duration": 1.52
},
{
"text": "want to parallelize on multiple machines",
"start": 2587.04,
"duration": 1.68
},
{
"text": "a code which is very slow on a single",
"start": 2590.08,
"duration": 3.04
},
{
"text": "machine. So first step is optimize",
"start": 2592.0,
"duration": 1.92
},
{
"text": "locally. And you can even do this on",
"start": 2594.56,
"duration": 2.56
},
{
"text": "your laptop to benchmark different",
"start": 2596.24,
"duration": 1.68
},
{
"text": "parts of your code and try to improve",
"start": 2599.12,
"duration": 2.88
},
{
"text": "performance. And then once you know",
"start": 2602.24,
"duration": 3.12
},
{
"text": "that your code is already running as",
"start": 2604.72,
"duration": 2.48
},
{
"text": "fast as you can then you can work on",
"start": 2608.0,
"duration": 3.28
},
{
"text": "parallelism.",
"start": 2611.36,
"duration": 3.36
},
{
"text": "Is there any questions on",
"start": 2613.84,
"duration": 2.48
},
{
"text": "the basics of number?",
"start": 2617.68,
"duration": 3.84
},
{
"text": "Yeah, there is one question Andrea.",
"start": 2619.12,
"duration": 1.44
},
{
"text": "Can you disable can you see that?",
"start": 2623.52,
"duration": 4.4
},
{
"text": "Yes. So nowadays",
"start": 2625.36,
"duration": 1.84
},
{
"text": "yeah you can disable no Python. I",
"start": 2627.84,
"duration": 2.48
},
{
"text": "mean you lose most of the performance of",
"start": 2631.36,
"duration": 3.52
},
{
"text": "Numba if you use no Python. So so",
"start": 2634.8,
"duration": 3.44
},
{
"text": "yeah so you can check if you gain",
"start": 2640.64,
"duration": 5.84
},
{
"text": "anything by doing no Python equals false",
"start": 2643.28,
"duration": 2.64
},
{
"text": "but generally you don't gain much. So",
"start": 2646.08,
"duration": 2.8
},
{
"text": "you have to",
"start": 2650.64,
"duration": 4.56
},
{
"text": "yeah maybe",
"start": 2653.92,
"duration": 3.28
},
{
"text": "you can see if you can",
"start": 2656.08,
"duration": 2.16
},
{
"text": "",
"start": 2660.08,
"duration": 4.0
},
{
"text": "maybe if you can reimplement",
"start": 2661.84,
"duration": 1.76
},
{
"text": "the sci function in in pure NumPy for",
"start": 2664.88,
"duration": 3.04
},
{
"text": "example or if not yeah there's not",
"start": 2668.56,
"duration": 3.68
},
{
"text": "much you can do unfortunately if",
"start": 2673.76,
"duration": 5.2
},
{
"text": "you're relying on a big sci",
"start": 2675.68,
"duration": 1.92
},
{
"text": "library then a number is not going to",
"start": 2678.88,
"duration": 3.2
},
{
"text": "give you much speed up. So my",
"start": 2680.88,
"duration": 2.0
},
{
"text": "suggestion is try to so make sure",
"start": 2683.76,
"duration": 2.88
},
{
"text": "that you are passing to the sci library",
"start": 2688.08,
"duration": 4.32
},
{
"text": "a a large amount of data all the same",
"start": 2691.84,
"duration": 3.76
},
{
"text": "time so that you can really leverage the",
"start": 2694.72,
"duration": 2.88
},
{
"text": "performance improvements inside the",
"start": 2697.92,
"duration": 3.2
},
{
"text": "library not looping too much on Python",
"start": 2699.84,
"duration": 1.92
},
{
"text": "before calling sci-fi so you want to get",
"start": 2703.76,
"duration": 3.92
},
{
"text": "your data to sci-fi as quickly as",
"start": 2706.32,
"duration": 2.56
},
{
"text": "possible",
"start": 2708.88,
"duration": 2.56
},
{
"text": "is there Any other question?",
"start": 2711.76,
"duration": 2.88
},
{
"text": "No, I don't see any new questions. Thank",
"start": 2715.84,
"duration": 4.08
},
{
"text": "you.",
"start": 2718.32,
"duration": 2.48
},
{
"text": "So, let's go. Yes. So, let's just",
"start": 2718.72,
"duration": 0.4
},
{
"text": "go a little bit deeper in using. So,",
"start": 2721.68,
"duration": 2.96
},
{
"text": "so the Python functions that you develop",
"start": 2726.64,
"duration": 4.96
},
{
"text": "and then you want to give to Numba, they",
"start": 2728.96,
"duration": 2.32
},
{
"text": "can use NumPy functions and most I think",
"start": 2731.6,
"duration": 2.64
},
{
"text": "95% or something functionality of",
"start": 2734.88,
"duration": 3.28
},
{
"text": "NumPy is supported by Numba. And",
"start": 2739.36,
"duration": 4.48
},
{
"text": "one important thing to understand is",
"start": 2744.24,
"duration": 4.88
},
{
"text": "that Numba is not actually using NumPy",
"start": 2746.08,
"duration": 1.84
},
{
"text": "for real. So you just implement",
"start": 2748.96,
"duration": 2.88
},
{
"text": "functions in NumPy to basically explain",
"start": 2753.12,
"duration": 4.16
},
{
"text": "Numba what's the work that needs to be",
"start": 2757.44,
"duration": 4.32
},
{
"text": "done and then Numba is going to bypass",
"start": 2759.28,
"duration": 1.84
},
{
"text": "NumPy and directly and go and call the",
"start": 2762.72,
"duration": 3.44
},
{
"text": "low-level math libraries. So even NumPy",
"start": 2768.08,
"duration": 5.36
},
{
"text": "for most of the most critical",
"start": 2771.28,
"duration": 3.2
},
{
"text": "operation is calling some optimized",
"start": 2773.2,
"duration": 1.92
},
{
"text": "blast implementation some optimized",
"start": 2777.28,
"duration": 4.08
},
{
"text": "library implementation",
"start": 2780.08,
"duration": 2.8
},
{
"text": "and Numba is bypassing that just calling",
"start": 2782.0,
"duration": 1.92
},
{
"text": "the underlying then optimized compile",
"start": 2784.4,
"duration": 2.4
},
{
"text": "libraries underneath",
"start": 2788.16,
"duration": 3.76
},
{
"text": "but still you it's still extremely",
"start": 2791.36,
"duration": 3.2
},
{
"text": "useful because you can develop your",
"start": 2796.16,
"duration": 4.8
},
{
"text": "function without. So you start without",
"start": 2798.64,
"duration": 2.48
},
{
"text": "the the decorator, you develop your",
"start": 2801.84,
"duration": 3.2
},
{
"text": "function, you test it, make sure that's",
"start": 2805.12,
"duration": 3.28
},
{
"text": "working as expected, and then you add",
"start": 2807.36,
"duration": 2.24
},
{
"text": "just in time compilation and you make",
"start": 2811.12,
"duration": 3.76
},
{
"text": "sure that it's still working as",
"start": 2814.24,
"duration": 3.12
},
{
"text": "expected. And so you see you can",
"start": 2816.16,
"duration": 1.92
},
{
"text": "do conditionals, you can use any any",
"start": 2819.6,
"duration": 3.44
},
{
"text": "specific function assignments and and so",
"start": 2823.12,
"duration": 3.52
},
{
"text": "on.",
"start": 2825.68,
"duration": 2.56
},
{
"text": "And so you see that",
"start": 2828.56,
"duration": 2.88
},
{
"text": "so inside this object",
"start": 2832.4,
"duration": 3.84
},
{
"text": "Numba is actually storing",
"start": 2836.16,
"duration": 3.76
},
{
"text": "different compiled versions of your",
"start": 2839.2,
"duration": 3.04
},
{
"text": "function. So with integers with floats",
"start": 2841.76,
"duration": 2.56
},
{
"text": "",
"start": 2849.52,
"duration": 7.76
},
{
"text": "here we we're just looking at",
"start": 2851.2,
"duration": 1.68
},
{
"text": "a different the speed that you",
"start": 2856.72,
"duration": 5.52
},
{
"text": "get with different implementations",
"start": 2861.36,
"duration": 4.64
},
{
"text": "but yeah we don't",
"start": 2864.4,
"duration": 3.04
},
{
"text": "yeah it's not very important and",
"start": 2867.04,
"duration": 2.64
},
{
"text": "i'm going to skip creating new funks",
"start": 2870.24,
"duration": 3.2
},
{
"text": "that is not very interesting but you can",
"start": 2872.8,
"duration": 2.56
},
{
"text": "look at it on your own if you think",
"start": 2875.52,
"duration": 2.72
},
{
"text": "you might be interested. I think the",
"start": 2878.56,
"duration": 3.04
},
{
"text": "most important topic that I wanted to",
"start": 2880.32,
"duration": 1.76
},
{
"text": "cover about Numba",
"start": 2881.84,
"duration": 1.52
},
{
"text": "is going to be multi- threading.",
"start": 2885.52,
"duration": 3.68
},
{
"text": "So as we were talking before we",
"start": 2889.6,
"duration": 4.08
},
{
"text": "want to",
"start": 2892.96,
"duration": 3.36
},
{
"text": "release the global interpreter lock.",
"start": 2895.28,
"duration": 2.32
},
{
"text": "But also an important thing is that",
"start": 2898.88,
"duration": 3.6
},
{
"text": "Numba can do multi-threading for you. So",
"start": 2903.44,
"duration": 4.56
},
{
"text": "and it's or has a parallel equal true",
"start": 2906.8,
"duration": 3.36
},
{
"text": "arguments that is going to use some",
"start": 2911.44,
"duration": 4.64
},
{
"text": "aistics inside the code to understand",
"start": 2916.24,
"duration": 4.8
},
{
"text": "how to parallelize your code. So and",
"start": 2919.36,
"duration": 3.12
},
{
"text": "also support some more complicated",
"start": 2922.72,
"duration": 3.36
},
{
"text": "functions. So even if you have some",
"start": 2925.92,
"duration": 3.2
},
{
"text": "reduction",
"start": 2928.32,
"duration": 2.4
},
{
"text": "of your for example you want to sum",
"start": 2929.92,
"duration": 1.6
},
{
"text": "you can you want to compute some",
"start": 2933.04,
"duration": 3.12
},
{
"text": "computation then you want to sum the",
"start": 2934.72,
"duration": 1.68
},
{
"text": "results of your computation that's also",
"start": 2937.76,
"duration": 3.04
},
{
"text": "parallelized by Numba. So number is",
"start": 2940.8,
"duration": 3.04
},
{
"text": "going to use multiple threads in the",
"start": 2942.8,
"duration": 2.0
},
{
"text": "parallel part of your code and then it's",
"start": 2945.84,
"duration": 3.04
},
{
"text": "going to as each threads finishes",
"start": 2947.76,
"duration": 1.92
},
{
"text": "their job is going to accumulate",
"start": 2952.4,
"duration": 4.64
},
{
"text": "everything into the output variable",
"start": 2954.4,
"duration": 2.0
},
{
"text": "and then it's going to return that",
"start": 2957.6,
"duration": 3.2
},
{
"text": "variable.",
"start": 2959.52,
"duration": 1.92
},
{
"text": "So so you see this looks kind of",
"start": 2962.56,
"duration": 3.04
},
{
"text": "magic and as always kind of magic",
"start": 2966.88,
"duration": 4.32
},
{
"text": "sometimes it doesn't work. So",
"start": 2970.24,
"duration": 3.36
},
{
"text": "and for if it doesn't work, it's",
"start": 2974.0,
"duration": 3.76
},
{
"text": "going to give you an error that tells",
"start": 2977.6,
"duration": 3.6
},
{
"text": "you why it is failing. So you want",
"start": 2979.76,
"duration": 2.16
},
{
"text": "to try this first on maybe a simplified",
"start": 2982.96,
"duration": 3.2
},
{
"text": "version of your code. Try to understand",
"start": 2986.0,
"duration": 3.04
},
{
"text": "how to make this parallel and then and",
"start": 2988.48,
"duration": 2.48
},
{
"text": "then make the code more complicated",
"start": 2992.0,
"duration": 3.52
},
{
"text": "so that you understand step by step",
"start": 2995.28,
"duration": 3.28
},
{
"text": "if number can work for you in",
"start": 2998.72,
"duration": 3.44
},
{
"text": "your situation or if not. So you",
"start": 3003.44,
"duration": 4.72
},
{
"text": "see here for example that we're calling",
"start": 3007.76,
"duration": 4.32
},
{
"text": "a mean. So even this so this is one of",
"start": 3009.44,
"duration": 1.68
},
{
"text": "those situations where you need to",
"start": 3012.88,
"duration": 3.44
},
{
"text": "implement it properly otherwise there",
"start": 3015.04,
"duration": 2.16
},
{
"text": "are race conditions where",
"start": 3016.8,
"duration": 1.76
},
{
"text": "parts of the code are executed out of",
"start": 3019.52,
"duration": 2.72
},
{
"text": "order and so that's why having a",
"start": 3021.28,
"duration": 1.76
},
{
"text": "parallel computing library like Numba",
"start": 3024.4,
"duration": 3.12
},
{
"text": "that a multi-threading library like",
"start": 3028.16,
"duration": 3.76
},
{
"text": "Numba can help you so that you don't",
"start": 3030.88,
"duration": 2.72
},
{
"text": "have to think too much about those",
"start": 3033.52,
"duration": 2.64
},
{
"text": "details and let Numba take care of that.",
"start": 3036.32,
"duration": 2.8
},
{
"text": "And a very useful Jupyter magic",
"start": 3041.12,
"duration": 4.8
},
{
"text": "function that you want to use when",
"start": 3046.72,
"duration": 5.6
},
{
"text": "you're using multi- threading is time.",
"start": 3048.88,
"duration": 2.16
},
{
"text": "So if you do percent time, this is going",
"start": 3051.6,
"duration": 2.72
},
{
"text": "to execute the Python code and then it's",
"start": 3055.04,
"duration": 3.44
},
{
"text": "going to give you so you see the",
"start": 3057.92,
"duration": 2.88
},
{
"text": "difference between time it and time. So",
"start": 3061.04,
"duration": 3.12
},
{
"text": "time it is specifically for repeating",
"start": 3063.84,
"duration": 2.8
},
{
"text": "that computation many many times to be",
"start": 3067.2,
"duration": 3.36
},
{
"text": "give you a very precise estimate of",
"start": 3069.52,
"duration": 2.32
},
{
"text": "the speed of your computation of the",
"start": 3073.44,
"duration": 3.92
},
{
"text": "time taken time just executes once but a",
"start": 3075.2,
"duration": 1.76
},
{
"text": "very important thing that you want to do",
"start": 3079.92,
"duration": 4.72
},
{
"text": "is you want to compare CPU time and with",
"start": 3082.08,
"duration": 2.16
},
{
"text": "world time. So, CPU times is how",
"start": 3086.56,
"duration": 4.48
},
{
"text": "much",
"start": 3090.88,
"duration": 4.32
},
{
"text": "each so the overall time over all the",
"start": 3092.48,
"duration": 1.6
},
{
"text": "all the cores that you are using.",
"start": 3097.92,
"duration": 5.44
},
{
"text": "So, so you see when they are very",
"start": 3101.68,
"duration": 3.76
},
{
"text": "similar that means that there is no",
"start": 3104.4,
"duration": 2.72
},
{
"text": "multi-threading at all. So you are there",
"start": 3107.2,
"duration": 2.8
},
{
"text": "is one core with which is executing all",
"start": 3110.56,
"duration": 3.36
},
{
"text": "of your computation. So this is",
"start": 3113.44,
"duration": 2.88
},
{
"text": "not multi-threaded. Because we put",
"start": 3115.52,
"duration": 2.08
},
{
"text": "parallel equal false. But if you have",
"start": 3119.04,
"duration": 3.52
},
{
"text": "parallel equals true, you see we",
"start": 3122.56,
"duration": 3.52
},
{
"text": "computation took six",
"start": 3125.28,
"duration": 2.72
},
{
"text": "milliseconds. But in those six",
"start": 3128.16,
"duration": 2.88
},
{
"text": "milliseconds, we used",
"start": 3131.04,
"duration": 2.88
},
{
"text": "700 let's say for simplifying 600",
"start": 3134.32,
"duration": 3.28
},
{
"text": "milliseconds.",
"start": 3137.36,
"duration": 3.04
},
{
"text": "That means that about 100 cores that",
"start": 3138.96,
"duration": 1.6
},
{
"text": "were executing at the same time.",
"start": 3143.04,
"duration": 4.08
},
{
"text": "So this is a a great way of",
"start": 3145.28,
"duration": 2.24
},
{
"text": "understanding how much how how your",
"start": 3149.2,
"duration": 3.92
},
{
"text": "computation is running. So here we are",
"start": 3153.2,
"duration": 4.0
},
{
"text": "basically parallelizing 100 times.",
"start": 3155.12,
"duration": 1.92
},
{
"text": "Still you don't get get 100 times the",
"start": 3157.92,
"duration": 2.8
},
{
"text": "speed up. Time is just going from",
"start": 3160.96,
"duration": 3.04
},
{
"text": "19 to six. So of course the",
"start": 3163.36,
"duration": 2.4
},
{
"text": "theoretical",
"start": 3167.36,
"duration": 4.0
},
{
"text": "100x",
"start": 3168.96,
"duration": 1.6
},
{
"text": "speed up is can be achieved only in",
"start": 3170.48,
"duration": 1.52
},
{
"text": "specific situations.",
"start": 3173.68,
"duration": 3.2
},
{
"text": "",
"start": 3175.76,
"duration": 2.08
},
{
"text": "and then here is I told you",
"start": 3177.76,
"duration": 2.0
},
{
"text": "about doing parallel true for",
"start": 3183.28,
"duration": 5.52
},
{
"text": "automatically have Numba taking care of",
"start": 3185.76,
"duration": 2.48
},
{
"text": "everything. If you want to be more",
"start": 3188.16,
"duration": 2.4
},
{
"text": "explicit and you want to tell Numba",
"start": 3189.68,
"duration": 1.52
},
{
"text": "where to parallelize, then you want to",
"start": 3192.0,
"duration": 2.32
},
{
"text": "use P range. But this is not a",
"start": 3194.8,
"duration": 2.8
},
{
"text": "very important topic. So, I don't",
"start": 3198.0,
"duration": 3.2
},
{
"text": "want to dig into this right now. Is",
"start": 3200.88,
"duration": 2.88
},
{
"text": "there any question on using multiple",
"start": 3203.84,
"duration": 2.96
},
{
"text": "threads with number? So, we are still",
"start": 3207.36,
"duration": 3.52
},
{
"text": "speaking about running your code on a",
"start": 3209.76,
"duration": 2.4
},
{
"text": "single machine. Either but in but",
"start": 3212.72,
"duration": 2.96
},
{
"text": "but instead of running at Python",
"start": 3216.32,
"duration": 3.6
},
{
"text": "speed we run at C speed instead of",
"start": 3218.56,
"duration": 2.24
},
{
"text": "running on one core we run multiple",
"start": 3220.8,
"duration": 2.24
},
{
"text": "course",
"start": 3223.28,
"duration": 2.48
},
{
"text": "there is one there was one question that",
"start": 3225.12,
"duration": 1.84
},
{
"text": "robin Kyle asked and Robin helped",
"start": 3227.36,
"duration": 2.24
},
{
"text": "out with but if you want to take a look",
"start": 3230.48,
"duration": 3.12
},
{
"text": "at this one there that you're looking",
"start": 3232.56,
"duration": 2.08
},
{
"text": "at Kyle asked if you if can you disable",
"start": 3234.96,
"duration": 2.4
},
{
"text": "no Python or otherwise allow yourself to",
"start": 3238.32,
"duration": 3.36
},
{
"text": "use other things like some complicated",
"start": 3240.88,
"duration": 2.56
},
{
"text": "sci P function.",
"start": 3243.36,
"duration": 2.48
},
{
"text": "So",
"start": 3245.76,
"duration": 2.4
},
{
"text": "yeah. Yeah. No, I I agree with with the",
"start": 3247.76,
"duration": 2.0
},
{
"text": "with the answer. Thank you very much.",
"start": 3250.08,
"duration": 2.32
},
{
"text": "",
"start": 3252.32,
"duration": 2.24
},
{
"text": "very good. If there's no other",
"start": 3257.36,
"duration": 5.04
},
{
"text": "question, let me get to that.",
"start": 3259.28,
"duration": 1.92
},
{
"text": "We have one more question. Upatha",
"start": 3261.76,
"duration": 2.48
},
{
"text": "that. There you go. That bottom one.",
"start": 3269.44,
"duration": 7.68
},
{
"text": "If we run a P.",
"start": 3272.64,
"duration": 3.2
},
{
"text": "Yes. So OM noom threads. So it depends",
"start": 3273.68,
"duration": 1.04
},
{
"text": "on your environment. Most of the time",
"start": 3276.72,
"duration": 3.04
},
{
"text": "is automatic and",
"start": 3280.32,
"duration": 3.6
},
{
"text": "automatically Numba is going to",
"start": 3283.2,
"duration": 2.88
},
{
"text": "use the same number of threads as the",
"start": 3286.96,
"duration": 3.76
},
{
"text": "existing course on your machines. So",
"start": 3291.28,
"duration": 4.32
},
{
"text": "generally if you want to use all the",
"start": 3295.76,
"duration": 4.48
},
{
"text": "cores you don't need to do that.",
"start": 3297.52,
"duration": 1.76
},
{
"text": "But if you want to use less then you",
"start": 3300.88,
"duration": 3.36
},
{
"text": "should set num thread. So let's say you",
"start": 3303.6,
"duration": 2.72
},
{
"text": "have 128 cores but maybe you want to use",
"start": 3305.92,
"duration": 2.32
},
{
"text": "just half of that because your code is",
"start": 3308.48,
"duration": 2.56
},
{
"text": "not parallelizing very well and then you",
"start": 3311.52,
"duration": 3.04
},
{
"text": "so let's switch to Dask.",
"start": 3319.84,
"duration": 8.32
},
{
"text": "So Dask compared to Numba. So Dask is a",
"start": 3324.72,
"duration": 4.88
},
{
"text": "distributed computing framework.",
"start": 3329.92,
"duration": 5.2
},
{
"text": "So Dask gives you the capability of",
"start": 3333.12,
"duration": 3.2
},
{
"text": "running on multiple nodes. So",
"start": 3337.84,
"duration": 4.72
},
{
"text": "basically instead of running just on",
"start": 3341.04,
"duration": 3.2
},
{
"text": "this single node as we are doing now in",
"start": 3343.68,
"duration": 2.64
},
{
"text": "this single node where my Jupyter",
"start": 3346.64,
"duration": 2.96
},
{
"text": "notebook is running, we want to launch",
"start": 3349.2,
"duration": 2.56
},
{
"text": "task worker processes on different",
"start": 3352.0,
"duration": 2.8
},
{
"text": "nodes. So we're going to launch a",
"start": 3356.0,
"duration": 4.0
},
{
"text": "separate ZLURM job that's executing the",
"start": 3358.48,
"duration": 2.48
},
{
"text": "workers and then we're going to have",
"start": 3361.44,
"duration": 2.96
},
{
"text": "them execute some jobs. So with",
"start": 3364.32,
"duration": 2.88
},
{
"text": "this you could use maybe 10 nodes at",
"start": 3366.8,
"duration": 2.48
},
{
"text": "the same time. So you can leverage a lot",
"start": 3370.96,
"duration": 4.16
},
{
"text": "of memory, a lot of RAM and you can",
"start": 3373.76,
"duration": 2.8
},
{
"text": "leverage a lot of CPU. So this is for",
"start": 3377.04,
"duration": 3.28
},
{
"text": "the scale of your computing. A larger",
"start": 3383.76,
"duration": 6.72
},
{
"text": "Dask is a great option because automates",
"start": 3386.64,
"duration": 2.88
},
{
"text": "a lot of the underlying",
"start": 3389.68,
"duration": 3.04
},
{
"text": "communication aspects that instead if",
"start": 3393.76,
"duration": 4.08
},
{
"text": "you are using for example MPI so NPI is",
"start": 3397.12,
"duration": 3.36
},
{
"text": "an MPI for PI package which is extremely",
"start": 3401.04,
"duration": 3.92
},
{
"text": "low level. So I think that",
"start": 3404.72,
"duration": 3.68
},
{
"text": "with Python we want to leverage the this",
"start": 3408.48,
"duration": 3.76
},
{
"text": "high level capabilities that allow",
"start": 3411.6,
"duration": 3.12
},
{
"text": "you to describe a very complicated",
"start": 3414.4,
"duration": 2.8
},
{
"text": "computation in a very simple way and you",
"start": 3417.44,
"duration": 3.04
},
{
"text": "want to delegate to the automatic",
"start": 3420.88,
"duration": 3.44
},
{
"text": "functions of a package like D to handle",
"start": 3426.16,
"duration": 5.28
},
{
"text": "the job dependencies and data transfer",
"start": 3429.2,
"duration": 3.04
},
{
"text": "and synchronization between the task.",
"start": 3432.96,
"duration": 3.76
},
{
"text": "Task. So you want to rely on a",
"start": 3434.64,
"duration": 1.68
},
{
"text": "library that has can do that for you.",
"start": 3438.0,
"duration": 3.36
},
{
"text": "So,, Dask,, so,, one one",
"start": 3440.32,
"duration": 2.32
},
{
"text": "of the probably the best feature of Dask",
"start": 3446.0,
"duration": 5.68
},
{
"text": "is,, so now it's it's already noon,",
"start": 3448.56,
"duration": 2.56
},
{
"text": "so I'm going to go ahead another,, 15",
"start": 3451.28,
"duration": 2.72
},
{
"text": "maybe 20 minutes and then I'm going to",
"start": 3455.36,
"duration": 4.08
},
{
"text": "leave the last 10 minutes for, Q& A,",
"start": 3457.84,
"duration": 2.48
},
{
"text": "for questions because, I mean, we've",
"start": 3460.8,
"duration": 2.96
},
{
"text": "taken just some questions while we're",
"start": 3464.0,
"duration": 3.2
},
{
"text": "going. So you see a key aspect",
"start": 3466.48,
"duration": 2.48
},
{
"text": "of parallel computation is the",
"start": 3471.84,
"duration": 5.36
},
{
"text": "relationship between the different",
"start": 3477.6,
"duration": 5.76
},
{
"text": "tasks. So whenever you have a",
"start": 3480.24,
"duration": 2.64
},
{
"text": "computation you have different steps and",
"start": 3483.28,
"duration": 3.04
},
{
"text": "those steps depend one to the other.",
"start": 3486.32,
"duration": 3.04
},
{
"text": "And so the only way of",
"start": 3489.68,
"duration": 3.36
},
{
"text": "computing properly in parallel is is",
"start": 3495.2,
"duration": 5.52
},
{
"text": "keeping track of that. And this is",
"start": 3498.64,
"duration": 3.44
},
{
"text": "a very general concept. So it's really",
"start": 3502.4,
"duration": 3.76
},
{
"text": "good if you learn Dask and if you",
"start": 3504.56,
"duration": 2.16
},
{
"text": "understand this these parts to any other",
"start": 3507.04,
"duration": 2.48
},
{
"text": "parallel computing system. And",
"start": 3510.56,
"duration": 3.52
},
{
"text": "so Dask whenever you call some Dask",
"start": 3514.96,
"duration": 4.4
},
{
"text": "commands and in this case we are",
"start": 3519.92,
"duration": 4.96
},
{
"text": "using Dask array. So Dask array is a sub",
"start": 3522.56,
"duration": 2.64
},
{
"text": "package of Dask which gives you an",
"start": 3526.8,
"duration": 4.24
},
{
"text": "interface which is extremely similar to",
"start": 3529.6,
"duration": 2.8
},
{
"text": "pi but allows you to run in parallel.",
"start": 3532.16,
"duration": 2.56
},
{
"text": "So let's take a look. So you see",
"start": 3537.52,
"duration": 5.36
},
{
"text": "here I just called once.",
"start": 3539.76,
"duration": 2.24
},
{
"text": "So this creates an array once",
"start": 3544.16,
"duration": 4.4
},
{
"text": "and and then I am doing a computation",
"start": 3547.92,
"duration": 3.76
},
{
"text": "like I would do in NumPy. But you",
"start": 3552.8,
"duration": 4.88
},
{
"text": "see yeah for example let's take a",
"start": 3555.68,
"duration": 2.88
},
{
"text": "look at this Dask graph. So",
"start": 3557.76,
"duration": 2.08
},
{
"text": "this is a direct a cyclic graph. So it's",
"start": 3561.2,
"duration": 3.44
},
{
"text": "a graph and has arrows and these arrows",
"start": 3564.16,
"duration": 2.96
},
{
"text": "point to the flow of computation. And",
"start": 3568.24,
"duration": 4.08
},
{
"text": "so whenever you have separate columns",
"start": 3571.68,
"duration": 3.44
},
{
"text": "like this that means that this is parts",
"start": 3574.16,
"duration": 2.48
},
{
"text": "of the code that can run in parallel.",
"start": 3576.96,
"duration": 2.8
},
{
"text": "So you see here whenever we run",
"start": 3578.88,
"duration": 1.92
},
{
"text": "with Dask compared to lumpy we need",
"start": 3584.4,
"duration": 5.52
},
{
"text": "to specify the chunk size. This is",
"start": 3587.12,
"duration": 2.72
},
{
"text": "tells us basically what's the minim",
"start": 3591.2,
"duration": 4.08
},
{
"text": "minimum unit of computation. And so now",
"start": 3594.08,
"duration": 2.88
},
{
"text": "instead of thinking of our array as a",
"start": 3597.12,
"duration": 3.04
},
{
"text": "single array of 15 elements, we have",
"start": 3600.96,
"duration": 3.84
},
{
"text": "three arrays of five elements. And so",
"start": 3604.0,
"duration": 3.04
},
{
"text": "whenever we want to compute something,",
"start": 3606.24,
"duration": 2.24
},
{
"text": "we want to compute, we can compute them",
"start": 3608.64,
"duration": 2.4
},
{
"text": "in parallel, one array at a time. So we",
"start": 3610.88,
"duration": 2.24
},
{
"text": "can leverage multiple tasks multiple",
"start": 3613.36,
"duration": 2.48
},
{
"text": "threads on your machine and those",
"start": 3616.48,
"duration": 3.12
},
{
"text": "threads can be executed by multiple",
"start": 3620.24,
"duration": 3.76
},
{
"text": "cores so that you can speed up your",
"start": 3622.48,
"duration": 2.24
},
{
"text": "computation. And so here you see we are",
"start": 3625.52,
"duration": 3.04
},
{
"text": "doing so we are creating array once",
"start": 3629.2,
"duration": 3.68
},
{
"text": "and then we are calling add because we",
"start": 3633.28,
"duration": 4.08
},
{
"text": "are adding one. And this is",
"start": 3636.64,
"duration": 3.36
},
{
"text": "running in parallel on three threads.",
"start": 3639.2,
"duration": 2.56
},
{
"text": "And then we call sum but first we sum",
"start": 3642.72,
"duration": 3.52
},
{
"text": "locally right. We sum each thread is",
"start": 3646.0,
"duration": 3.28
},
{
"text": "summing its own five elements and then",
"start": 3650.96,
"duration": 4.96
},
{
"text": "transferring the sum",
"start": 3655.04,
"duration": 4.08
},
{
"text": "to be aggregated and then the three",
"start": 3658.4,
"duration": 3.36
},
{
"text": "sums are summed together and this come",
"start": 3661.12,
"duration": 2.72
},
{
"text": "back. So you see even here there is",
"start": 3662.8,
"duration": 1.68
},
{
"text": "communication there is coordination",
"start": 3664.48,
"duration": 1.68
},
{
"text": "between the threads but this is you",
"start": 3667.76,
"duration": 3.28
},
{
"text": "don't care about this this is handled",
"start": 3670.32,
"duration": 2.56
},
{
"text": "automatically by Dask you just tell",
"start": 3671.92,
"duration": 1.6
},
{
"text": "Dask what to do and then",
"start": 3675.04,
"duration": 3.12
},
{
"text": "Dask is going to execute it taking",
"start": 3678.32,
"duration": 3.28
},
{
"text": "care of all that and I wanted to show",
"start": 3681.04,
"duration": 2.72
},
{
"text": "you how the graph becomes very",
"start": 3683.68,
"duration": 2.64
},
{
"text": "complicated very fast so this is just a",
"start": 3689.36,
"duration": 5.68
},
{
"text": "simple simple",
"start": 3692.56,
"duration": 3.2
},
{
"text": "matrix computation. So just dot products",
"start": 3695.84,
"duration": 3.28
},
{
"text": "transpose or something. You see",
"start": 3699.6,
"duration": 3.76
},
{
"text": "how",
"start": 3702.32,
"duration": 2.72
},
{
"text": "becomes really complicated.",
"start": 3703.92,
"duration": 1.6
},
{
"text": "And but good news is you don't have",
"start": 3706.16,
"duration": 2.24
},
{
"text": "to take care of this yourself",
"start": 3709.76,
"duration": 3.6
},
{
"text": "because Dask is doing it for you.",
"start": 3712.96,
"duration": 3.2
},
{
"text": "And you see you there is another",
"start": 3723.6,
"duration": 10.64
},
{
"text": "interface if you just call your object.",
"start": 3726.72,
"duration": 3.12
},
{
"text": "Dask",
"start": 3730.08,
"duration": 3.36
},
{
"text": "where you can look at each step of",
"start": 3731.92,
"duration": 1.84
},
{
"text": "your computation what's the data size.",
"start": 3735.76,
"duration": 3.84
},
{
"text": "So you see here is a 3D array with",
"start": 3738.8,
"duration": 3.04
},
{
"text": "15 15 and three and you can so you can",
"start": 3743.2,
"duration": 4.4
},
{
"text": "so the the so all of these are great",
"start": 3747.36,
"duration": 4.16
},
{
"text": "user",
"start": 3751.84,
"duration": 4.48
},
{
"text": "interface functionality that helps",
"start": 3753.36,
"duration": 1.52
},
{
"text": "you understand your distributed",
"start": 3756.32,
"duration": 2.96
},
{
"text": "computation because distributed",
"start": 3758.08,
"duration": 1.76
},
{
"text": "computing is hard. So sometimes you",
"start": 3760.24,
"duration": 2.16
},
{
"text": "think your computation is doing",
"start": 3762.32,
"duration": 2.08
},
{
"text": "something while it's doing something",
"start": 3765.84,
"duration": 3.52
},
{
"text": "else. So you can dig Dask allows",
"start": 3766.96,
"duration": 1.12
},
{
"text": "gives you all this capability of digging",
"start": 3770.88,
"duration": 3.92
},
{
"text": "inside your computation and understand",
"start": 3773.28,
"duration": 2.4
},
{
"text": "really what is going on.",
"start": 3775.68,
"duration": 2.4
},
{
"text": "Let me show you delayed and",
"start": 3778.16,
"duration": 2.48
},
{
"text": "then we're going to stop for questions.",
"start": 3781.92,
"duration": 3.76
},
{
"text": "So I saw you here how you can use",
"start": 3785.6,
"duration": 3.68
},
{
"text": "a simple way of using daskar. Now let's",
"start": 3790.16,
"duration": 4.56
},
{
"text": "use a kind of lower level functionality",
"start": 3793.52,
"duration": 3.36
},
{
"text": "of D but which is extremely flexible.",
"start": 3797.36,
"duration": 3.84
},
{
"text": "So basically let's say that you",
"start": 3802.8,
"duration": 5.44
},
{
"text": "have a bunch of files.",
"start": 3806.64,
"duration": 3.84
},
{
"text": "And",
"start": 3810.16,
"duration": 3.52
},
{
"text": "",
"start": 3818.24,
"duration": 8.08
},
{
"text": "you have this processing that you want",
"start": 3820.48,
"duration": 2.24
},
{
"text": "you want to apply to each single",
"start": 3823.52,
"duration": 3.04
},
{
"text": "file.",
"start": 3825.6,
"duration": 2.08
},
{
"text": "And what you can do with D is",
"start": 3828.0,
"duration": 2.4
},
{
"text": "let's run all of them in parallel at the",
"start": 3834.32,
"duration": 6.32
},
{
"text": "same time.",
"start": 3837.28,
"duration": 2.96
},
{
"text": "And so what you are going to do is",
"start": 3839.44,
"duration": 2.16
},
{
"text": "basically you take your normal Python",
"start": 3843.12,
"duration": 3.68
},
{
"text": "function and you call delayed on that.",
"start": 3847.2,
"duration": 4.08
},
{
"text": "You see I I call delayed on",
"start": 3852.4,
"duration": 5.2
},
{
"text": "process file. This gives me back another",
"start": 3855.36,
"duration": 2.96
},
{
"text": "function but this other function is",
"start": 3858.48,
"duration": 3.12
},
{
"text": "asynchronous. So this other",
"start": 3861.28,
"duration": 2.8
},
{
"text": "function can run in parallel with in",
"start": 3863.76,
"duration": 2.48
},
{
"text": "parallel. So instead of running",
"start": 3866.88,
"duration": 3.12
},
{
"text": "one and then instead of executing the",
"start": 3871.36,
"duration": 4.48
},
{
"text": "first then executing the second and then",
"start": 3874.64,
"duration": 3.28
},
{
"text": "executing the third.",
"start": 3876.8,
"duration": 2.16
},
{
"text": "This is a simple way of running multiple",
"start": 3880.72,
"duration": 3.92
},
{
"text": "functions that you have defined in",
"start": 3884.96,
"duration": 4.24
},
{
"text": "parallel.",
"start": 3887.12,
"duration": 2.16
},
{
"text": "So you see very simple graph. So",
"start": 3888.96,
"duration": 1.84
},
{
"text": "you",
"start": 3891.52,
"duration": 2.56
},
{
"text": "you process and there is also an",
"start": 3893.76,
"duration": 2.24
},
{
"text": "so you see another feature of DS",
"start": 3900.8,
"duration": 7.04
},
{
"text": "is that it is lazy. So whenever you call",
"start": 3903.68,
"duration": 2.88
},
{
"text": "dash delayed or Dask array Dask data",
"start": 3909.12,
"duration": 5.44
},
{
"text": "frame whatever you call is just under",
"start": 3913.28,
"duration": 4.16
},
{
"text": "the hood is building the the",
"start": 3917.28,
"duration": 4.0
},
{
"text": "dependency graph for you but then is",
"start": 3921.44,
"duration": 4.16
},
{
"text": "only execute when you explicitly say",
"start": 3924.96,
"duration": 3.52
},
{
"text": "does compute.",
"start": 3928.24,
"duration": 3.28
},
{
"text": "And so at that point is actually",
"start": 3930.88,
"duration": 2.64
},
{
"text": "being executed.",
"start": 3934.64,
"duration": 3.76
},
{
"text": "So you see here that basically",
"start": 3936.64,
"duration": 2.0
},
{
"text": "what I'm doing is I'm taking I want the",
"start": 3940.96,
"duration": 4.32
},
{
"text": "s I want to process them in parallel.",
"start": 3944.16,
"duration": 3.2
},
{
"text": "But once I process them in",
"start": 3947.6,
"duration": 3.44
},
{
"text": "parallel I want to take their results",
"start": 3949.6,
"duration": 2.0
},
{
"text": "and sum.",
"start": 3952.56,
"duration": 2.96
},
{
"text": "Each of them is doing a word count",
"start": 3954.72,
"duration": 2.16
},
{
"text": "and so you call sum but you need an",
"start": 3957.36,
"duration": 2.64
},
{
"text": "asynchronous sum and that's why you call",
"start": 3960.96,
"duration": 3.6
},
{
"text": "task delayed as well here",
"start": 3963.52,
"duration": 2.56
},
{
"text": "and and then you can you can execute",
"start": 3967.2,
"duration": 3.68
},
{
"text": "it",
"start": 3970.96,
"duration": 3.76
},
{
"text": "or you can call dmp compute and",
"start": 3973.52,
"duration": 2.56
},
{
"text": "execute this computation the same",
"start": 3976.56,
"duration": 3.04
},
{
"text": "thing we did over here does compute",
"start": 3979.44,
"duration": 2.88
},
{
"text": "and we are still working on",
"start": 3984.4,
"duration": 4.96
},
{
"text": "",
"start": 3993.6,
"duration": 9.2
},
{
"text": "I only have one question from a",
"start": 3994.24,
"duration": 0.64
},
{
"text": "yeah. So the visualization you need a a",
"start": 3997.44,
"duration": 3.2
},
{
"text": "package for the JavaScript",
"start": 4000.8,
"duration": 3.36
},
{
"text": "which is this one.",
"start": 4007.04,
"duration": 6.24
},
{
"text": "So if you're running locally, you",
"start": 4011.92,
"duration": 4.88
},
{
"text": "should have this function so that",
"start": 4014.56,
"duration": 2.64
},
{
"text": "the JavaScript should work.",
"start": 4019.68,
"duration": 5.12
},
{
"text": "If it still doesn't work, please open",
"start": 4023.76,
"duration": 4.08
},
{
"text": "an issue on the repository and I can",
"start": 4026.0,
"duration": 2.24
},
{
"text": "help you after the tutorial.",
"start": 4028.72,
"duration": 2.72
},
{
"text": "Is there any question on task",
"start": 4033.04,
"duration": 4.32
},
{
"text": "on task delayed?",
"start": 4035.92,
"duration": 2.88
},
{
"text": "So task delayed is very flexible. So for",
"start": 4038.88,
"duration": 2.96
},
{
"text": "example something you could do is",
"start": 4040.8,
"duration": 1.92
},
{
"text": "overlap computing and data access. So if",
"start": 4043.84,
"duration": 3.04
},
{
"text": "you have if you are reading a file and",
"start": 4047.04,
"duration": 3.2
},
{
"text": "processing it, reading a second file and",
"start": 4050.56,
"duration": 3.52
},
{
"text": "process it you you can achieve pretty",
"start": 4052.8,
"duration": 2.24
},
{
"text": "easily. You could process the first",
"start": 4057.12,
"duration": 4.32
},
{
"text": "file and in the background you could",
"start": 4060.08,
"duration": 2.96
},
{
"text": "load the first the second file so that",
"start": 4062.8,
"duration": 2.72
},
{
"text": "while you're doing computation you're",
"start": 4065.68,
"duration": 2.88
},
{
"text": "also doing the disk access. So you save",
"start": 4067.92,
"duration": 2.24
},
{
"text": "time and this you can achieve with",
"start": 4069.84,
"duration": 1.92
},
{
"text": "so Dask delay you're running on a so",
"start": 4080.96,
"duration": 11.12
},
{
"text": "zurm array jobs",
"start": 4085.28,
"duration": 4.32
},
{
"text": "it's just allows you to do trivial",
"start": 4088.08,
"duration": 2.8
},
{
"text": "parallel tasks. So each of those long",
"start": 4093.12,
"duration": 5.04
},
{
"text": "jobs is executing independently and",
"start": 4096.16,
"duration": 3.04
},
{
"text": "they don't coordinate at all with",
"start": 4100.72,
"duration": 4.56
},
{
"text": "each other. Like with delayed you can do",
"start": 4102.88,
"duration": 2.16
},
{
"text": "you can coordinate you can do you can",
"start": 4105.92,
"duration": 3.04
},
{
"text": "chain multiple functions one after",
"start": 4111.68,
"duration": 5.76
},
{
"text": "the other",
"start": 4114.24,
"duration": 2.56
},
{
"text": "and and handle dependencies between",
"start": 4117.68,
"duration": 3.44
},
{
"text": "different tasks. So let's say that",
"start": 4121.2,
"duration": 3.52
},
{
"text": "can execute more complicated",
"start": 4123.52,
"duration": 2.32
},
{
"text": "computations that you can do with slurm",
"start": 4127.76,
"duration": 4.24
},
{
"text": "arrays.",
"start": 4132.08,
"duration": 4.32
},
{
"text": "",
"start": 4134.64,
"duration": 2.56
},
{
"text": "yes it's kind of similar to snakem make",
"start": 4136.56,
"duration": 1.92
},
{
"text": "workflow and yes you can achieve",
"start": 4139.92,
"duration": 3.36
},
{
"text": "something similar with snake make yeah",
"start": 4142.72,
"duration": 2.8
},
{
"text": "Dask yeah Dask is more flexible",
"start": 4147.44,
"duration": 4.72
},
{
"text": "you are implementing",
"start": 4151.84,
"duration": 4.4
},
{
"text": "it yourself so you have some more",
"start": 4154.56,
"duration": 2.72
},
{
"text": "wheels to turn while snake may gives you",
"start": 4158.08,
"duration": 3.52
},
{
"text": "probably simpler interface to some tasks",
"start": 4162.8,
"duration": 4.72
},
{
"text": "but not as flexible as task task",
"start": 4166.0,
"duration": 3.2
},
{
"text": "yeah exactly is automatically doing",
"start": 4179.36,
"duration": 13.36
},
{
"text": "that processing for you so you just tell",
"start": 4182.56,
"duration": 3.2
},
{
"text": "it oh do the do the first So",
"start": 4186.24,
"duration": 3.68
},
{
"text": "whenever you are executing one load",
"start": 4191.6,
"duration": 5.36
},
{
"text": "the second and then you tell it oh when",
"start": 4195.44,
"duration": 3.84
},
{
"text": "you you have loaded the second wait for",
"start": 4198.4,
"duration": 2.96
},
{
"text": "the computation on the first one to",
"start": 4200.8,
"duration": 2.4
},
{
"text": "to work and then it's going to",
"start": 4204.16,
"duration": 3.36
},
{
"text": "do it for you.",
"start": 4208.32,
"duration": 4.16
},
{
"text": "I'm sorry we are a bit short on",
"start": 4211.28,
"duration": 2.96
},
{
"text": "time. So let's go to array to the",
"start": 4213.92,
"duration": 2.64
},
{
"text": "last we have the last three notebooks.",
"start": 4217.76,
"duration": 3.84
},
{
"text": "So task array yeah this is a",
"start": 4220.72,
"duration": 2.96
},
{
"text": "very simple example of viewing Dask",
"start": 4225.84,
"duration": 5.12
},
{
"text": "array and comparing how you use Numba",
"start": 4227.52,
"duration": 1.68
},
{
"text": "Dask array and NumPy. So I have",
"start": 4232.8,
"duration": 5.28
},
{
"text": "this I have a single computation.",
"start": 4236.72,
"duration": 3.92
},
{
"text": "So you see how we do it. If you",
"start": 4239.92,
"duration": 3.2
},
{
"text": "have a computation this is a big very",
"start": 4244.08,
"duration": 4.16
},
{
"text": "large matrix and we do a bunch of",
"start": 4246.72,
"duration": 2.64
},
{
"text": "computation on that matrix using NumPy",
"start": 4250.56,
"duration": 3.84
},
{
"text": "and if we want to use step one is",
"start": 4253.28,
"duration": 2.72
},
{
"text": "let's use number so",
"start": 4256.48,
"duration": 3.2
},
{
"text": "really easy you take the function you",
"start": 4259.36,
"duration": 2.88
},
{
"text": "give it to number and number is going to",
"start": 4262.16,
"duration": 2.8
},
{
"text": "optimize it for you and now you have a",
"start": 4264.4,
"duration": 2.24
},
{
"text": "separate function",
"start": 4266.4,
"duration": 2.0
},
{
"text": "which",
"start": 4268.4,
"duration": 2.0
},
{
"text": "is just in time compiled so runs",
"start": 4270.4,
"duration": 2.0
},
{
"text": "very fast. With Dask instead is a",
"start": 4273.92,
"duration": 3.52
},
{
"text": "bit different. So what you do with Dask",
"start": 4277.92,
"duration": 4.0
},
{
"text": "is basically every time you call a NumPy",
"start": 4280.0,
"duration": 2.08
},
{
"text": "function now you call a Dask array",
"start": 4283.76,
"duration": 3.76
},
{
"text": "function. So you see now instead of np.sin and",
"start": 4286.08,
"duration": 2.32
},
{
"text": "np.log I have da.log and da.sin. So",
"start": 4290.72,
"duration": 1.46
},
{
"text": "those function are already",
"start": 4295.12,
"duration": 4.4
},
{
"text": "aware of the parallel of the",
"start": 4298.64,
"duration": 3.52
},
{
"text": "distributing computing situation and",
"start": 4303.76,
"duration": 5.12
},
{
"text": "so can leverage multiple",
"start": 4306.48,
"duration": 2.72
},
{
"text": "workers working on this.",
"start": 4312.96,
"duration": 6.48
},
{
"text": "So you see how",
"start": 4316.8,
"duration": 3.84
},
{
"text": "NumPy is very slow. So even NumPy is",
"start": 4319.2,
"duration": 2.4
},
{
"text": "running C code right. NumPy is",
"start": 4322.72,
"duration": 3.52
},
{
"text": "implemented in C. So under the hood is",
"start": 4325.68,
"duration": 2.96
},
{
"text": "really fast but we you see whenever",
"start": 4328.0,
"duration": 2.32
},
{
"text": "you're doing a NumPy computation you're",
"start": 4331.6,
"duration": 3.6
},
{
"text": "going into C and out in and out. And",
"start": 4333.92,
"duration": 2.32
},
{
"text": "every time you go in and out NumPy",
"start": 4337.6,
"duration": 3.68
},
{
"text": "needs an array where to store the",
"start": 4340.08,
"duration": 2.48
},
{
"text": "output. And so it's going to",
"start": 4342.0,
"duration": 1.92
},
{
"text": "create a lot of temporaries temporary",
"start": 4344.64,
"duration": 2.64
},
{
"text": "arrays. And so it's very inefficient. So",
"start": 4347.52,
"duration": 2.88
},
{
"text": "it takes 30 three seconds. So the same",
"start": 4349.76,
"duration": 2.24
},
{
"text": "function with Numba takes just one",
"start": 4352.96,
"duration": 3.2
},
{
"text": "second. And this is also using",
"start": 4356.24,
"duration": 3.28
},
{
"text": "some parallel computing and is and even",
"start": 4360.64,
"duration": 4.4
},
{
"text": "better it's using less memory.",
"start": 4363.6,
"duration": 2.96
},
{
"text": "Because it's not creating any",
"start": 4368.0,
"duration": 4.4
},
{
"text": "temporary array. So he's basically",
"start": 4371.92,
"duration": 3.92
},
{
"text": "taking a one element at a time computing",
"start": 4374.0,
"duration": 2.08
},
{
"text": "the log multiplying for this multiplying",
"start": 4377.68,
"duration": 3.68
},
{
"text": "with the sum and all of that and writing",
"start": 4379.76,
"duration": 2.08
},
{
"text": "to the output function and then loop",
"start": 4382.24,
"duration": 2.48
},
{
"text": "through the array and that's it.",
"start": 4384.64,
"duration": 2.4
},
{
"text": "While NumPy is going to give take a give",
"start": 4387.2,
"duration": 2.56
},
{
"text": "it to NumPy get back an array of the log",
"start": 4391.92,
"duration": 4.72
},
{
"text": "of a then it's going to do the same with",
"start": 4395.12,
"duration": 3.2
},
{
"text": "the sign same with the square. Now you",
"start": 4397.36,
"duration": 2.24
},
{
"text": "have a bunch of arrays and then it's",
"start": 4399.68,
"duration": 2.32
},
{
"text": "going to multiply those array together.",
"start": 4401.28,
"duration": 1.6
},
{
"text": "So it's inefficient. So the same",
"start": 4403.04,
"duration": 1.76
},
{
"text": "computation you can do with Dask and",
"start": 4405.6,
"duration": 2.56
},
{
"text": "it's actually not too bad. It's just a",
"start": 4407.92,
"duration": 2.32
},
{
"text": "little. So the difference is that Dask",
"start": 4410.56,
"duration": 2.64
},
{
"text": "is going to take small chunks and do the",
"start": 4414.16,
"duration": 3.6
},
{
"text": "quick computations in parallel on all",
"start": 4419.12,
"duration": 4.96
},
{
"text": "the chunks and then aggregate everything",
"start": 4421.28,
"duration": 2.16
},
{
"text": "at the end. So you are paying a",
"start": 4423.92,
"duration": 2.64
},
{
"text": "bit the price because so still it's",
"start": 4426.8,
"duration": 2.88
},
{
"text": "not as fast as number but the big",
"start": 4430.08,
"duration": 3.28
},
{
"text": "advantage of Dask is that this can run",
"start": 4433.04,
"duration": 2.96
},
{
"text": "on multiple machines so if your data",
"start": 4435.36,
"duration": 2.32
},
{
"text": "doesn't fit in memory then number cannot",
"start": 4438.0,
"duration": 2.64
},
{
"text": "help you much you need Dask so it's good",
"start": 4440.8,
"duration": 2.8
},
{
"text": "to see that Dask is pretty efficient as",
"start": 4444.08,
"duration": 3.28
},
{
"text": "well because when you want to jump from",
"start": 4446.4,
"duration": 2.32
},
{
"text": "one machine to multiple machines you",
"start": 4449.28,
"duration": 2.88
},
{
"text": "have to go with Dask And actually",
"start": 4451.2,
"duration": 1.92
},
{
"text": "the best is to",
"start": 4455.44,
"duration": 4.24
},
{
"text": "leverage both. So you want you can do a",
"start": 4458.32,
"duration": 2.88
},
{
"text": "distributed computing computation with D",
"start": 4461.68,
"duration": 3.36
},
{
"text": "but then each individual so on each node",
"start": 4464.4,
"duration": 2.72
},
{
"text": "you are executing some number code.",
"start": 4468.08,
"duration": 3.68
},
{
"text": "So you so you should start",
"start": 4470.96,
"duration": 2.88
},
{
"text": "at the beginning play with n number",
"start": 4475.28,
"duration": 4.32
},
{
"text": "first then once you gain some speed up",
"start": 4478.96,
"duration": 3.68
},
{
"text": "start looking into Dask. So the the the",
"start": 4481.68,
"duration": 2.72
},
{
"text": "big difference between the two is that",
"start": 4485.52,
"duration": 3.84
},
{
"text": "while you can transform a code",
"start": 4487.36,
"duration": 1.84
},
{
"text": "incrementally using Numba you cannot",
"start": 4489.84,
"duration": 2.48
},
{
"text": "do that with dust. So if you want to run",
"start": 4492.88,
"duration": 3.04
},
{
"text": "with dust you have to re basically",
"start": 4495.12,
"duration": 2.24
},
{
"text": "rewrite your code to use dust. But",
"start": 4497.6,
"duration": 2.48
},
{
"text": "the benefit can be really large and can",
"start": 4500.32,
"duration": 2.72
},
{
"text": "allow you to process data doesn't fit in",
"start": 4503.76,
"duration": 3.44
},
{
"text": "memory. So let's switch to",
"start": 4506.24,
"duration": 2.48
},
{
"text": "notebook number three. So in this case",
"start": 4509.6,
"duration": 3.36
},
{
"text": "we have a",
"start": 4512.88,
"duration": 3.28
},
{
"text": "Andrea",
"start": 4515.28,
"duration": 2.4
},
{
"text": "question where are they? Andrea, just",
"start": 4515.92,
"duration": 0.64
},
{
"text": "Andrea, real quick, I think we hold",
"start": 4519.04,
"duration": 3.12
},
{
"text": "we're getting a few questions, but I'm",
"start": 4521.04,
"duration": 2.0
},
{
"text": "going to suggest we hold them till",
"start": 4523.04,
"duration": 2.0
},
{
"text": "you're done with your material.",
"start": 4524.4,
"duration": 1.36
},
{
"text": "They look pretty detailed. So, I can ask",
"start": 4527.28,
"duration": 2.88
},
{
"text": "here after 12:30 to answer those",
"start": 4530.0,
"duration": 2.72
},
{
"text": "questions which are very detailed and",
"start": 4534.64,
"duration": 4.64
},
{
"text": "for now I'm going to finish",
"start": 4537.6,
"duration": 2.96
},
{
"text": "this content. And so another",
"start": 4541.92,
"duration": 4.32
},
{
"text": "intermediate situation is where you are",
"start": 4545.2,
"duration": 3.28
},
{
"text": "still on a single node but your data",
"start": 4548.4,
"duration": 3.2
},
{
"text": "doesn't fit in memory. And so what you",
"start": 4550.96,
"duration": 2.56
},
{
"text": "want to do is you allows you to",
"start": 4552.8,
"duration": 1.84
},
{
"text": "basically stream through your data. So",
"start": 4557.92,
"duration": 5.12
},
{
"text": "you if your data is on disk you can load",
"start": 4560.32,
"duration": 2.4
},
{
"text": "a piece of your array do the processing",
"start": 4563.52,
"duration": 3.2
},
{
"text": "write the results and and clear up",
"start": 4567.2,
"duration": 3.68
},
{
"text": "memory and then use the memory for",
"start": 4570.48,
"duration": 3.28
},
{
"text": "something else. So here for example I",
"start": 4572.96,
"duration": 2.48
},
{
"text": "have",
"start": 4575.52,
"duration": 2.56
},
{
"text": "128 gB of RAM I think on expanse or",
"start": 4577.68,
"duration": 2.16
},
{
"text": "something like that and but I am",
"start": 4580.72,
"duration": 3.04
},
{
"text": "processing 300 GB and I I can do",
"start": 4582.48,
"duration": 1.76
},
{
"text": "that because Numba is automatically",
"start": 4587.6,
"duration": 5.12
},
{
"text": "loading just a subset of your data just",
"start": 4592.72,
"duration": 5.12
},
{
"text": "a number of chunks exe run the execution",
"start": 4596.64,
"duration": 3.92
},
{
"text": "and then flush away the memory. So",
"start": 4599.92,
"duration": 3.28
},
{
"text": "that you can reuse that memory",
"start": 4603.52,
"duration": 3.6
},
{
"text": "again. And now let's get to the",
"start": 4606.72,
"duration": 3.2
},
{
"text": "last topic of today which is",
"start": 4611.84,
"duration": 5.12
},
{
"text": "finally running on a multiple",
"start": 4615.04,
"duration": 3.2
},
{
"text": "multiple machines. So multiple nodes and",
"start": 4620.56,
"duration": 5.52
},
{
"text": "so the good news is that the interface",
"start": 4623.6,
"duration": 3.04
},
{
"text": "is the same.",
"start": 4627.76,
"duration": 4.16
},
{
"text": "So from the user perspective you",
"start": 4629.92,
"duration": 2.16
},
{
"text": "are just doing you're just calling Dask",
"start": 4632.96,
"duration": 3.04
},
{
"text": "function and Dask is going to",
"start": 4636.72,
"duration": 3.76
},
{
"text": "automatically handle the distributed",
"start": 4639.28,
"duration": 2.56
},
{
"text": "part of the computations for you.",
"start": 4642.24,
"duration": 2.96
},
{
"text": "So first of all we are stepping",
"start": 4645.92,
"duration": 3.68
},
{
"text": "is in something which is more",
"start": 4651.36,
"duration": 5.44
},
{
"text": "complicated. So we need",
"start": 4652.8,
"duration": 1.44
},
{
"text": "we need to we need a process",
"start": 4657.44,
"duration": 4.64
},
{
"text": "that can that can coordinate your",
"start": 4663.76,
"duration": 6.32
},
{
"text": "jobs and this is the scheduler. So",
"start": 4669.12,
"duration": 5.36
},
{
"text": "you see I am in my computing node. So",
"start": 4672.16,
"duration": 3.04
},
{
"text": "now I am colllocated in the same",
"start": 4674.96,
"duration": 2.8
},
{
"text": "machine which is running my Jupyter",
"start": 4679.76,
"duration": 4.8
},
{
"text": "notebook and I'm run launching this",
"start": 4681.76,
"duration": 2.0
},
{
"text": "scheduler. Again the code for this is",
"start": 4685.76,
"duration": 4.0
},
{
"text": "all in GitHub and now the scheduler",
"start": 4688.4,
"duration": 2.64
},
{
"text": "is going to coordinate my jobs. So",
"start": 4692.24,
"duration": 3.84
},
{
"text": "what you do is you run this cell",
"start": 4695.68,
"duration": 3.44
},
{
"text": "which is telling",
"start": 4699.36,
"duration": 3.68
},
{
"text": "which is telling Dask that we are using",
"start": 4701.92,
"duration": 2.56
},
{
"text": "the distributed scheduler. So",
"start": 4707.12,
"duration": 5.2
},
{
"text": "you see whenever you call this now the",
"start": 4710.64,
"duration": 3.52
},
{
"text": "notebook is going to connect to your",
"start": 4713.6,
"duration": 2.96
},
{
"text": "scheduler so that can",
"start": 4718.88,
"duration": 5.28
},
{
"text": "can",
"start": 4726.08,
"duration": 7.2
},
{
"text": "leverage the workers that we",
"start": 4727.68,
"duration": 1.6
},
{
"text": "are going to launch pretty soon.",
"start": 4732.48,
"duration": 4.8
},
{
"text": "So so see how you describe your",
"start": 4735.52,
"duration": 3.04
},
{
"text": "computation. It's exactly the same as",
"start": 4742.24,
"duration": 6.72
},
{
"text": "before. So look at this. You are",
"start": 4744.32,
"duration": 2.08
},
{
"text": "telling dust what's the chunk size.",
"start": 4748.72,
"duration": 4.4
},
{
"text": "You and",
"start": 4752.88,
"duration": 4.16
},
{
"text": "so the only difference between this and",
"start": 4756.0,
"duration": 3.12
},
{
"text": "the one before is this cell here.",
"start": 4758.56,
"duration": 2.56
},
{
"text": "So whenever you execute this code",
"start": 4763.6,
"duration": 5.04
},
{
"text": "any call that happens after is going",
"start": 4767.6,
"duration": 4.0
},
{
"text": "to be instead of running locally. So the",
"start": 4772.96,
"duration": 5.36
},
{
"text": "one we did before it was running in the",
"start": 4776.08,
"duration": 3.12
},
{
"text": "same",
"start": 4779.04,
"duration": 2.96
},
{
"text": "machine now",
"start": 4781.6,
"duration": 2.56
},
{
"text": "nothing is running locally. So when you",
"start": 4784.16,
"duration": 2.56
},
{
"text": "run so you see here the scheduler",
"start": 4786.56,
"duration": 2.4
},
{
"text": "that's saying I have a connection to",
"start": 4790.56,
"duration": 4.0
},
{
"text": "something this means oh you see here",
"start": 4794.88,
"duration": 4.32
},
{
"text": "receive client connection so the client",
"start": 4797.68,
"duration": 2.8
},
{
"text": "is connected to the scheduler now",
"start": 4800.24,
"duration": 2.56
},
{
"text": "if I execute compute this the client so",
"start": 4804.4,
"duration": 4.16
},
{
"text": "the notebook is is sending the jobs",
"start": 4809.28,
"duration": 4.88
},
{
"text": "to our scheduler. And we can",
"start": 4815.12,
"duration": 5.84
},
{
"text": "look",
"start": 4820.16,
"duration": 5.04
},
{
"text": "by",
"start": 4821.68,
"duration": 1.52
},
{
"text": "looking at the dashboard. So this",
"start": 4824.24,
"duration": 2.56
},
{
"text": "is one of the extremely useful features",
"start": 4827.84,
"duration": 3.6
},
{
"text": "of Dask is is the dashboard.",
"start": 4831.12,
"duration": 3.28
},
{
"text": "So you see this is my notebook and I",
"start": 4837.52,
"duration": 6.4
},
{
"text": "have this link here which is",
"start": 4840.56,
"duration": 3.04
},
{
"text": "basically connected to some specific",
"start": 4842.72,
"duration": 2.16
},
{
"text": "port and in that port here is a",
"start": 4844.72,
"duration": 2.0
},
{
"text": "realtime view of your distributed",
"start": 4849.84,
"duration": 5.12
},
{
"text": "computation. So this is not something",
"start": 4852.48,
"duration": 2.64
},
{
"text": "all this is not this is a great",
"start": 4855.68,
"duration": 3.2
},
{
"text": "insight of your parallel computation",
"start": 4859.2,
"duration": 3.52
},
{
"text": "because you can look at what's going on.",
"start": 4860.96,
"duration": 1.76
},
{
"text": "So you see here down at bottom right you",
"start": 4862.8,
"duration": 1.84
},
{
"text": "see zero over 400. So those are all the",
"start": 4866.08,
"duration": 3.28
},
{
"text": "tasks that we passed from the client",
"start": 4868.96,
"duration": 2.88
},
{
"text": "to the scheduler but the scheduler",
"start": 4872.88,
"duration": 3.92
},
{
"text": "doesn't have any worker so nobody can",
"start": 4876.24,
"duration": 3.36
},
{
"text": "execute this. So everything is stopped.",
"start": 4878.16,
"duration": 1.92
},
{
"text": "So now",
"start": 4880.32,
"duration": 2.16
},
{
"text": "what we're going to do is we need to",
"start": 4882.88,
"duration": 2.56
},
{
"text": "give some computational power to the",
"start": 4886.0,
"duration": 3.12
},
{
"text": "to the scheduler and we do that",
"start": 4890.96,
"duration": 4.96
},
{
"text": "with by launching a separate job.",
"start": 4894.24,
"duration": 3.28
},
{
"text": "",
"start": 4899.6,
"duration": 5.36
},
{
"text": "so we go into Dask slurm and you see",
"start": 4902.56,
"duration": 2.96
},
{
"text": "script.",
"start": 4910.48,
"duration": 7.92
},
{
"text": "",
"start": 4912.0,
"duration": 1.52
},
{
"text": "this is going to",
"start": 4914.16,
"duration": 2.16
},
{
"text": "launch a dash you. So you see now",
"start": 4917.2,
"duration": 3.04
},
{
"text": "this is Galileo. This is my session, my",
"start": 4924.56,
"duration": 7.36
},
{
"text": "Jupyter session. I have a separate job",
"start": 4928.24,
"duration": 3.68
},
{
"text": "of two nodes. And once they",
"start": 4931.6,
"duration": 3.36
},
{
"text": "get through the queue, that should",
"start": 4937.2,
"duration": 5.6
},
{
"text": "take just a minute or two. They",
"start": 4939.68,
"duration": 2.48
},
{
"text": "are going to be available to our",
"start": 4942.8,
"duration": 3.12
},
{
"text": "to our scheduler and then the",
"start": 4948.48,
"duration": 5.68
},
{
"text": "scheduler is going to",
"start": 4952.64,
"duration": 4.16
},
{
"text": "is going to execute those tasks.",
"start": 4957.04,
"duration": 4.4
},
{
"text": "",
"start": 4966.16,
"duration": 9.12
},
{
"text": "so we can",
"start": 4968.08,
"duration": 1.92
},
{
"text": "we can take a look at our",
"start": 4971.6,
"duration": 3.52
},
{
"text": "dashboard and we will see the jobs",
"start": 4975.84,
"duration": 4.24
},
{
"text": "coming in and once they execute the most",
"start": 4978.96,
"duration": 3.12
},
{
"text": "important view is task stream here at",
"start": 4982.4,
"duration": 3.44
},
{
"text": "the top right. So this tast is going",
"start": 4985.84,
"duration": 3.44
},
{
"text": "basically every horizontal line",
"start": 4988.96,
"duration": 3.12
},
{
"text": "that we'll sh we'll see appearing is a",
"start": 4993.6,
"duration": 4.64
},
{
"text": "separate u thread on a worker. And",
"start": 4997.76,
"duration": 4.16
},
{
"text": "so in our case we have two workers. Each",
"start": 5003.28,
"duration": 5.52
},
{
"text": "worker has 128 threads. And so",
"start": 5006.4,
"duration": 3.12
},
{
"text": "we're going to have 256 lines here that",
"start": 5011.44,
"duration": 5.04
},
{
"text": "show us in real time what each task",
"start": 5015.6,
"duration": 4.16
},
{
"text": "is doing. And this allows you to",
"start": 5019.2,
"duration": 3.6
},
{
"text": "understand how the computation is",
"start": 5023.12,
"duration": 3.92
},
{
"text": "executing. For example, if we have one",
"start": 5027.28,
"duration": 4.16
},
{
"text": "task only which is highlighted and",
"start": 5030.16,
"duration": 2.88
},
{
"text": "everything else is white, that means",
"start": 5032.4,
"duration": 2.24
},
{
"text": "that there is some problem in our",
"start": 5035.36,
"duration": 2.96
},
{
"text": "Code and only one task is executing.",
"start": 5038.56,
"duration": 3.2
},
{
"text": "And so maybe that's a a hint that we",
"start": 5042.8,
"duration": 4.24
},
{
"text": "have a problem with the global",
"start": 5046.56,
"duration": 3.76
},
{
"text": "interpreter block or or something else.",
"start": 5047.76,
"duration": 1.2
},
{
"text": "Or we can also see what time is spent",
"start": 5050.96,
"duration": 3.2
},
{
"text": "in different parts of your computation",
"start": 5055.76,
"duration": 4.8
},
{
"text": "and that's really difficult to do in a",
"start": 5057.6,
"duration": 1.84
},
{
"text": "distributed computing environment. So",
"start": 5060.0,
"duration": 2.4
},
{
"text": "that's a great insight into your",
"start": 5061.68,
"duration": 1.68
},
{
"text": "computation",
"start": 5064.16,
"duration": 2.48
},
{
"text": "and",
"start": 5066.08,
"duration": 1.92
},
{
"text": "and you can dig deeper here. For",
"start": 5069.6,
"duration": 3.52
},
{
"text": "example, if you click on the workers,",
"start": 5071.92,
"duration": 2.32
},
{
"text": "once the workers come out, you can check",
"start": 5074.08,
"duration": 2.16
},
{
"text": "their memory usage, CPU usage so that",
"start": 5078.0,
"duration": 3.92
},
{
"text": "you can see if you are running",
"start": 5082.72,
"duration": 4.72
},
{
"text": "your computation is running correctly",
"start": 5087.44,
"duration": 4.72
},
{
"text": "or not.",
"start": 5089.52,
"duration": 2.08
},
{
"text": "So this was the last part of",
"start": 5091.04,
"duration": 1.52
},
{
"text": "of the tutorial and so we can",
"start": 5095.12,
"duration": 4.08
},
{
"text": "enter the Q& A session and",
"start": 5098.48,
"duration": 3.36
},
{
"text": "and then hopefully the job is going to",
"start": 5104.24,
"duration": 5.76
},
{
"text": "go through and so I can see I can show",
"start": 5106.32,
"duration": 2.08
},
{
"text": "you the live view of the",
"start": 5109.04,
"duration": 2.72
},
{
"text": "dashboard but or you can do it",
"start": 5113.44,
"duration": 4.4
},
{
"text": "yourself on your machine. So this is",
"start": 5115.84,
"duration": 2.4
},
{
"text": "this dashboard is enabled by the",
"start": 5118.88,
"duration": 3.04
},
{
"text": "distributed scheduler. So make sure",
"start": 5120.96,
"duration": 2.08
},
{
"text": "that you look at the dust",
"start": 5123.36,
"duration": 2.4
},
{
"text": "documentation. Dask is explained very",
"start": 5126.24,
"duration": 2.88
},
{
"text": "well all the different steps. You",
"start": 5128.8,
"duration": 2.56
},
{
"text": "execute your distributed",
"start": 5130.72,
"duration": 1.92
},
{
"text": "scheduler, you connect to it and then",
"start": 5133.52,
"duration": 2.8
},
{
"text": "you have access to this very useful UI",
"start": 5135.84,
"duration": 2.32
},
{
"text": "dashboard. Thank you very",
"start": 5139.36,
"duration": 3.52
},
{
"text": "much. So we are on top of the time",
"start": 5141.92,
"duration": 2.56
},
{
"text": "that we allocated. So I'm going to stay",
"start": 5144.64,
"duration": 2.72
},
{
"text": "on and answer the questions. So if",
"start": 5146.8,
"duration": 2.16
},
{
"text": "you have more questions you can stay on.",
"start": 5149.36,
"duration": 2.56
},
{
"text": "Otherwise if you have other",
"start": 5151.84,
"duration": 2.48
},
{
"text": "appointment please feel free to",
"start": 5156.08,
"duration": 4.24
}
]
import json
import sys
def format_json_file(filepath, indent=4):
"""Reads a JSON file, formats it with the specified indent, and writes it back."""
try:
with open(filepath, 'r') as f:
data = json.load(f)
with open(filepath, 'w') as f:
json.dump(data, f, indent=indent)
print(f"Successfully formatted {filepath} with indent {indent}.")
except FileNotFoundError:
print(f"Error: File not found at {filepath}")
sys.exit(1)
except json.JSONDecodeError:
print(f"Error: Invalid JSON in {filepath}")
sys.exit(1)
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python format_json.py <filepath>")
sys.exit(1)
json_filepath = sys.argv[1]
format_json_file(json_filepath)
format-json:
python format_json.py 202509-Python-for-HPC.json
.PHONY: format-json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment