The rise and potential of large language model based agents: A survey
Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E. and others,, 2023. arXiv preprint arXiv:2309.07864.
Learning transferable visual models from natural language supervision
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J. and others,, 2021. ICML.
LightGlue: Local Feature Matching at Light Speed
Lindenberger, P., Sarlin, P. and Pollefeys, M., 2023. ICCV.
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Li, J., Li, D., Savarese, S. and Hoi, S., 2023. ICML.
GPT-4 technical report
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S. and others,, 2023. arXiv preprint arXiv:2303.08774.
Llama 2: Open foundation and fine-tuned chat models
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S. and others,, 2023. arXiv preprint arXiv:2307.09288.
Google Map Platform
Google Map Team, .. https://mapsplatform.google.com/.
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
Liu, S., Zeng, Z., Ren, T., Li, F., Zhang, H., Yang, J., Li, C., Yang, J., Su, H., Zhu, J. and others,, 2023. arXiv preprint arXiv:2303.05499.
Simple Open-Vocabulary Object Detection with Vision Transformers
Minderer, M., Gritsenko, A., Stone, A., Neumann, M., Weissenborn, D., Dosovitskiy, A., Mahendran, A., Arnab, A., Dehghani, M., Shen, Z. and others,, 2022. ECCV.
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Dai, W., Li, J., Li, D., Tiong, A.M.H., Zhao, J., Wang, W., Li, B., Fung, P. and Hoi, S., 2023. NeurIPS.
Improved Baselines with Visual Instruction Tuning
Liu, H., Li, C., Li, Y. and Lee, Y.J., 2023. arXiv:2310.03744.
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Sun, Q., Fang, Y., Wu, L., Wang, X. and Cao, Y., 2023. arXiv preprint arXiv:2303.15389.
PP-OCR: A practical ultra lightweight OCR system. arXiv 2020
Du, Y., Li, C., Guo, R., Yin, X., Liu, W., Zhou, J., Bai, Y., Yu, Z., Yang, Y., Dang, Q. and others,, 2020. arXiv preprint arXiv:2009.09941.
Large-scale privacy protection in google street view
Frome, A., Cheung, G., Abdulkader, A., Zennaro, M., Wu, B., Bissacco, A., Adam, H., Neven, H. and Vincent, L., 2009. ICCV.
Image geo-localization based on multiplenearest neighbor feature matching usinggeneralized graphs
2014. TPAMI.
Touchdown: Natural language navigation and spatial reasoning in visual street environments
Chen, H., Suhr, A., Misra, D., Snavely, N. and Artzi, Y., 2019. CVPR.