🦞🌯 Lobster Roll

Thread

Fast subsets of large datasets with Pandas and SQLite (pythonspeed.com)

Stories related to "Fast subsets of large datasets with Pandas and SQLite" across the full archive.

Fast subsets of large datasets with Pandas and SQLite (pythonspeed.com)
Fast Python: High performance techniques for large datasets (manning.com)
Rowboat – A fast tool for understanding large datasets (rowboat.xyz)
Fasttfidf: High-performance TF-IDF vectorization for large-scale text datasets (github.com)
Druid | Open-source infrastructure for Real-time Exploratory Analytics on Large datasets (druid.io)
Dat – A git-like tool for large datasets (dat-data.com)
Additional information: https://github.com/maxogden/dat/blob/master/what-is-dat.md Also: git repo: https://github.com/maxogden/dat example usage commands: https://github.com/maxogden/dat/blob/master/usage.md technical notes / supported formats: https://github.com/maxogden/dat/blob/maste...
(In)Security of Embedded Devices' Firmware - Fast and Furious at Large Scale (media.ccc.de)
> [..] In this talk, we present several methods that make *the large scale security analyses of embedded devices* a feasible task. We implemented those techniques in a scalable framework that we tested on real world data. First, we collected a large number of firmware images from Internet reposi...
The problem of parsing large datasets (haskell-works.github.io)
50 times faster data loading for Pandas: no problem, using C++ (blog.esciencecenter.nl)
Make Python Pandas go fast (blog.wallaroolabs.com)
ELSA: Efficient Long-Term Secure Storage of Large Datasets (arxiv.org)
Abstract: "An increasing amount of information today is generated, exchanged, and stored digitally. This also includes long-lived and highly sensitive information (e.g., electronic health records, governmental documents) whose integrity and confidentiality must be protected over decades or even cent...
How fast can you allocate a large block of memory in C++? (lemire.me)
From chunking to parallelism: faster Pandas with Dask (pythonspeed.com)
Codesearch: fast, indexed regexp search over large file trees (github.com)
The fastest way to read a CSV in Pandas (pythonspeed.com)
Pandas vectorization: faster code, slower code, bloated memory (pythonspeed.com)
Fast Collisions for Large Editable Vehicles (brickadia.com)
What's up Python? New args syntax, subinterpreters FastAPI and cuda pandas… (bitecode.dev)
reladiff: High-performance diffing of large datasets across databases (github.com)
Show HN: FP32 matmul of large matrices up to 24% faster than cuBLAS on a 4090 (github.com)
I decided to share a CUDA kernel I wrote over 5 months ago. Nvidia's hardware and software may surprise you.
Streaming Large Datasets in Elixir (jackmarchant.com)
How we made querying Pandas DataFrames with chDB 87x faster (clickhouse.com)
How we made querying Pandas DataFrames with chDB 87x faster (clickhouse.com)
Transparency is often lacking in datasets used to train large language models (news.mit.edu)
How we made querying Pandas DataFrames with chDB 87x faster (clickhouse.com)
A fast and space-efficient Base36 encoding for large data (github.com)
Nanocube: Lightning Fast OLAP-style point queries on Pandas DataFrames (github.com)
Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets (github.com)
Show HN: How we made querying Pandas DataFrames 87x faster (clickhouse.com)
Ask HN: Are embeddings too expensive for large datasets?
Hi HN,<p>I&#x27;ve recently spoken with two companies that mentioned the high costs of creating embeddings on their datasets for RAG applications. A PE firm shared that generating embeddings for new data rooms could cost up to $5K, limiting how often they do it.<p>I’m having trouble understanding wh...