So, the app developer should know exactly what to expect from the hardware. Ideally, a Boinc API that abstracts the different GPU specs and features needs be developed at the task level (unless this is already available) and the tasks should be distributed based on the specs and features requested. Opencl is general enough, but cuda offers more features. (more about this below)įor #2: Not sure it exists. If the GPU memory is not available, the task will have to wait. This has been possible on most architectures to overcome the additional time it takes to copy data over PCI pipeline to GPU RAM (from the regular CPU RAM). Often we do the allocation and copy the data over the PCI pipeline to the GPU card while the other cores are busy and computing the existing jobs. What if a job's GPU usage changes halfway through a job? A RAM allocation request might fail because we're running other GPU jobs.įor #1: I agree, but the order should be first about allocating the memory, then launching on available cores though.Are there cross-platform, cross-vendor APIs for finding usage info? I couldn't find any last time I looked.The client can run additional GPU jobs only if they fit in both. "GPU usage" is two separate things: number of cores used, and amount of (video) RAM used.I attached the script I used for milkyway here for reference įully using powerful GPUs is a good goal, but it's complicated: This will be increasingly needed more, not less. The newer compute cards have significantly more computation capacity (GP100 and GV100 compute cards), so I believe this is something you want to address better and optimally in the Boinc Manager. The servers I donated their free capacity to Boinc have all more than one K6000 cards on them (per motherboard). Instead of queueing, the task can be assigned to the next gpu card on the same motherboard. Most modern gpus (since 2010 and later) allow the load inquiries, it is not a rocket science. So, these inefficiencies can add up quickly. I left it at 4 tasks per gpu for now because I cannot actively monitor Boinc, so the gpu is under utilized by 33% when its idle time is donated to Boinc projects. It increased output by 4 times obviously, but the load on the gpu is still around 55-60% and memory usage is negligibly small.Įach gpu can easily handle up to 5 tasks and sometimes 6, but that critical last 16% depends on the load (so that it does not exceed 100%) and it needs to be dynamically monitored (but it won't crash the gpu, but queued). So, each NVidia K6000 card uses 4 tasks per gpu. I copied and pasted the configuration above into the project directory (Milky Way), set the gpu_usage to 0.25 and cpu_usage (for gpu tasks) to 0.1. Most modern (gpu) hardware will queue the task and won't launch when the resources (memory and cores) requested are not available anyway. The first criteria is always the memory, but then the computational load. In the long run, the user code should supply hints about the maximum load it will use in advance. In the short run, Boinc manager can readily start multiple GPU tasks on a single GPU and check the load on the GPU. Therefore, the user code should be able to notify Boinc manager about the compute cores it will be taking for GPU tasks and additional tasks should be allowed by the Boinc manager up to total compute cores. But, it may be better to communicate with the user code about the resource needs. Therefore, Boinc manager can launch automatically as well. I believe all NVidia and AMD compute cards report the computational load and memory utilization. The current design will result in increasingly less resource utilization (or underutilization) as the hardware is becoming more and more powerful. What needs to happen is a better load balancer to launch more than one task per GPU. I see the projects like and running GPU tasks for 3-5 minutes. Basically, the compute kernels launched do not have enough numeric complexity. But, I barely see 15-20% GPU load on a single GPU task. In other words, the GPU pipeline can easily handle several kernels simultaneously. These GPUs are older but they are still monsters in GFLOPs performance with over 2800 compute cores. I dedicated a few NVidia Quadro K6000 GPUs to various projects.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |